Before moving on to commanding the ZX Spectrum’s graphics hardware more directly, it’s time to settle some scores.

C64 version of the umbrella banner

One of the things I particularly liked about the ZX Spectrum’s BASIC is that it provides pretty powerful commands that let you directly work with everything the system has to offer, and it lets you do it quite easily. The closest we got to having to do an end run around BASIC here was POKEing in the graphics for the umbrella—and even that was just using it to fill a buffer that BASIC itself gave us. Commodore 64 BASIC, rather infamously, offers basically no direct support for its own hardware. Everything is done via memory-mapped I/O with POKE and PEEK, directly. Furthermore, in an unpleasant number of cases, it’s really necessary to supplement the BASIC program with some machine language support programs to keep everything running at an acceptable speed.

So I got to thinking. What does it take to match this simple display from the TI-99/4A, and later the MSX and ZX Spectrum, on the C64? Obviously it can do it. But what does it take to do it in 100% BASIC, and how idiomatic can we make it?

Defining the Problem

The C64’s default text mode is markedly unlike the ones offered by the TMS9918A chip in the TI and MSX, and it also differs from the ZX Spectrum’s treatment of character attributes. When in its normal text mode, there is a single background for the entire screen, and only the foreground color may vary. This can be bent in its Extended Background Color Mode, or bypassed entirely in its full bitmap mode. The C64’s bitmap mode uses what is normally the text display almost exactly the same way the ZX Spectrum uses its attribute table, in fact, so the most direct solution would be to switch into the bitmap mode and draw the display directly there.

This is an extremely drastic step, though, and the C64 Programmer’s Reference Guide is quite explicit that the bitmap mode is impractical to use in BASIC. We just need to shift our perspective a bit. Instead of thinking of this as a screen split between regions with a green background and regions with a black background, we should think of it as a screen where the whole background is black, but most of the screen is covered with solid green foreground. We don’t even need extended color mode for this one.

To get the umbrella graphic itself, we redefined four characters and assigned them appropriate colors. This is a bit of legerdemain on the part of the ZX Spectrum, since its display is always exclusively a bitmap mode and “characters” are simply conveniently predefined bit patterns, but the end result matches what we saw on the TI and what we intend here on the C64: the actual rendering of the banner graphic is done purely with PRINT statements.

So, for a graphic that matches the original TI display and logic as closely as possible, here is our general plan:

  • Copy the mixed-case character set out of ROM and into RAM.
  • Redefine a few of the characters in our RAM copy to give us our graphic.
  • Switch the active character set to our modified RAM copy. At this point we have replicated the CALL CHAR code in the TI original.
  • Turn the screen black, the border green, and fill the screen with green inverted spaces.
  • Draw the banner with a series of print statements.
  • Wait for a keypress.
  • Set the active character set back to default PETSCII, restore the border, background, and text colors, and clear the screen so that BASIC has returned to its normal state.

Seems easy enough, and we can prove the strategy out first by writing it in assembly language. The whole program is about 100 lines long—a bit under twice as long as the ZX Spectrum version we made last week—but it shows our idea here is sound.

A First Draft

Setting up a custom character set in BASIC is a task that’s common enough that there’s stock code for doing it. Here’s a typical one, very lightly adapted from Sheldon Leemon’s Mapping the C64 and encoded using the input format that the VICE project’s petcat utility expects:

  10 poke 828,peek(56):poke 56,48:clr:z=peek(828)
  20 poke 56333,127:poke 1,peek(1) and 251:for i=0 to 2047
  30 poke 12288+i,peek(55296+i):next:poke 1,peek(1) or 4:poke 56333,129
  40 for i=0 to 31:read a:poke 12568+i,a:next:poke 53272,28
1000 rem bumbershoot graphic
1010 data 3,15,31,63,127,127,255,255
1020 data 192,240,248,248,240,224,192,128
1030 data 255,254,124,120,48,0,0,0
1040 data 128,192,96,48,24,8,56,0

We need to set aside some RAM to hold our custom character set, and to do that we’ll need to modify BASIC’s memory layout. We’d like to restore it afterwards, though, so line 10 here prepares that. It records the original top of BASIC RAM in a scratch byte inside what is normally the cassette loading buffer, reconfigures BASIC to only have 10KB of memory (2048-12287, more than enough for us), then reloads the value out of the buffer into a proper BASIC variable. That little jaunt through the cassette buffer is necessary because reconfiguring BASIC’s memory ends up deleting all its variables too.

Lines 20 and 30 copy over the mixed-case character set into the RAM we reserved from 12288-16383. It disables interrupts, maps in the character ROM, copies it over, maps the character ROM back out, and re-enables interrupts afterwards. BASIC doesn’t really let you control the interrupt mask, but as I discussed in my old article about configuring IRQs from BASIC, there’s usually only one that’s on so you may simply turn it off for the duration. We have to disable interrupts during this process because the character ROM gets mapped over what is supposed to be all our I/O ports, and the system’s clock-tick handler will get very confused if the I/O devices are not there when it runs.

Finally, line 40 reads in the character data from the end of the program and POKEs it into place over the characters #$%&, which we otherwise are not using. Line 1000-1040 is the data itself, straight from the ZX Spectrum BASIC listing.

Now to set up the “blank” screen. We need to hit every character cell on the screen, so we can’t just PRINT or we’ll end up scrolling the top of our output away. We’ll directly POKE our inverted spaces and color codes into video RAM instead, after setting up the border and background colors:

  50 poke 53280,5:poke 53281,0
  60 for i=0 to 999:poke 1024+i,160:poke 55296+i,5:next

Now to print out the banner. This is a jumble of text control codes and on-keyboard graphics characters, but fortunately petcat is really good at those:

  70 print "{home}{9 down}{6 rght}{grn}{rvon}{CBM-D}{rvof}{26 space}{rvon}{CBM-F}"
  80 print "{6 rght}  {lblu}#${wht}  BUMBERSHOOT SOFTWARE  "
  90 print "{6 rght}  {lblu}%{orng}&{wht}    We adore our 64!    "
 100 print "{6 rght}{grn}{rvon}{CBM-C}{rvof}{26 space}{rvon}{CBM-V}"

We then wait for a keystroke and restore our display and original memory configuration on the way out.

 110 get a$:if a$="" then goto 110
 120 print "{lblu}{clr}";:poke 53280,14:poke 53281,6:poke 53272,20
 130 poke 56,z:clr

And this works! Sort of. But it’s awful; it takes nearly a full minute to produce the display, and for the first half or so of it RUN/STOP is disabled. Even filling the screen takes forever and ends up being an extended animation:

A slowly filling text screen

The VICE monitor includes a “stopwatch” in its register dump that measures how many CPU cycles have executed since poweron; this can be used as something very close to a microsecond timer. This isn’t a machine language program, so setting breakpoints isn’t as useful as we’d hope, but I can trap the POKEs with watchpoints instead. Some simple subtraction from there gives us our times, and they don’t look good: the character set copy takes 35 seconds, and filling the screen an additional 20.

Surely we can do better than this.

Doing Less Work

We aren’t using all 256 characters, so we don’t really need to copy them all over. We’re using enough of the alphabet and punctuation that we may as well copy that part in bulk. Character order is slightly different in the screen ROM so characters 64-95 come before characters 32-63… but that also means that we can load just characters ‘A’ through ‘6’ and then our five graphics characters at codes 160, 236, 251, 252, and 254. To make this change, we replace line 30 with this block:

  30 for i=8 to 439:poke 12288+i,peek(55296+i):next
  31 for i=1280 to 1287:poke 12288+i,peek(55296+i):next
  32 for i=1888 to 1895:poke 12288+i,peek(55296+i):next
  33 for i=2008 to 2039:poke 12288+i,peek(55296+i):next
  39 poke 1,peek(1) or 4:poke 56333,129

This dramatically drops the charset-copy time: it now only takes about 8.5 seconds, which in turn means the total execution time is halved. That’s still far too much, though.

Machine Language Helpers

The usual solution when BASIC is too slow is to drop down into machine code to do the expensive work. I even pointed that out and gave a small routine to do it wheb writing about BASIC coexisting with binary data blocks:

A machine language program can do this in milliseconds, so it’s really a far better use of your time to read in a tiny machine language program to do this copy. It was scandalously uncommon to do this, even in programs that freely mixed BASIC and machine code. When I dug through a stack of old BASIC programs to see what they did to operate acceptably, the fact that they basically all did the charset copy in BASIC was the thing that surprised me the most. It’s not even terribly onerous!

We’ve got more work to do than just the character-set copy, of course; we also have to fill the screen. Even so, we’re looking at very simple loops that only fill or copy buffers. We can replace lines 20-39 and line 60 with this routine, placed neatly into the same cassette buffer we used as temporary storage at the very start:

        .org    $033c

        ;; Copy mixed-case charset to $3000
        sei
        lda     $01
        and     #$fb
        sta     $01
        ldx     #$08
        ldy     #$00
lp:     lda     $d800,y
        sta     $3000,y
        dey
        bne     lp
        inc     lp+2
        inc     lp+5
        dex
        bne     lp
        lda     $01
        ora     #$04
        sta     $01
        cli

        ;; Fill screen with 1,000 green inverted spaces
        ldx     #$04                    ; 4 page pairs edited
        ldy     #$e7                    ; Last page is truncated
lp2:    lda     #$a0
        sta     $0700,y
        lda     #$05
        sta     $db00,y
        dey
        cpy     #$ff
        bne     lp2
        dec     lp2+4
        dec     lp2+9
        dex
        bne     lp2
        rts

It’s self-modifying code that destroys itself as it runs, but that’s fine because we’ll be loading it into place on each run and only running it once. This isn’t a task for the onefiling techniques I looked at a few months ago; it’s one for my much older BASIC loader generator. The lines we removed are replaced with these:

 20 for i=828 to 892:read a:poke i,a:next i:sys 828
900 rem machine language cheat code
910 data 120,165,1,41,251,133,1,162,8
920 data 160,0,185,0,216,153,0,48,136
930 data 208,247,238,73,3,238,76,3,202
940 data 208,238,165,1,9,4,133,1,88,162
950 data 4,160,231,169,160,153,0,7,169
960 data 5,153,0,219,136,192,255,208
970 data 241,206,104,3,206,109,3,202
980 data 208,232,96

This dramatically improves our setup: the total setup time has dropped to a second, the machine code itself runs in about 350ms, and the screen fills in an eyeblink. The rest of the setup time is just POKEing the machine code into place in the first place.

Not bad at all! There’s just the matter of how we are more machine code, now, than actual BASIC. Let’s start clawing back some ground.

Tuning the BASIC code

Some time ago I wrote a primer specifically on performance-tuning C64 BASIC code.. It even had a section specifically on attacking this problem of filling the screen. There are two main insights there: one of them is that we shouldn’t be putting constant numbers or math operations in our loops if we can help it, because those are more work and parsing numbers in particular is quite slow. The other is that PRINT statements are extremely fast, and we should rely on them as much as we can.

We get a few fixes in this time: we rework our loading routines to do less math inside them, delete the second part of our machine language code (lines 950-980, with one edit on line 940) so now it only copies over the character set, and copy over the screen-filling code from that old article. Here are the edits:

 20 for a=828 to 864:read b:poke a,b:next:sys 828
 30 for a=12568 to 12599:read b:poke a,b:next
 40 poke 53280,5:print "{grn}{clr}";:poke 53281,0
 45 for a=1 to 24:print "{rvon}{40 space}";:next a
 50 print "{rvon}{39 space}{home}";
 55 poke 2023,160:poke 56295,5
 60 poke 53272,28
940 data 208,238,165,1,9,4,133,1,88,96

This fails dramatically, and in a way I didn’t see in my older work:

The banner split into separate strips

I’m pretty sure that what’s happened here is that by never actually printing a newline, the C64’s screen editor has decided I wanted this all to be one single line in the screen editor. That editor mode can only handle 80 characters max, so the end result of our print statements here is 12 “logical” lines that fill the 24 main lines of the screen, and then a half-line at the very bottom. The cursor keys still all work as normal, but when print a carriage return like we do at the end of each PRINT statement in lines 70-100, it moved us to the start of the next logical line. We can test this by adding or removing one of our cursor-down characters in line 90, and this vindicates our theory: if we do that, the first line doesn’t end in a skip (because we’re now starting in the back half of a logical line), but all the rest of them do. There’s a lot of ways we could fix this or work around it; I decided to dodge the problem by printing 39 characters on each line instead of 40, then filling in the whole rightmost column with POKEs in a separate step.

It’s also kind of annoying to be visibly seeing the screen fill up; even though it’s way faster now it’s still clearly visible. We can do a little trick there, too: we can turn the background green after we clear the screen so it looks like we fill it instantaneously. Only once the screen is full will we change the background color to black.

These are our new edits, with line 50, whose job was filling the bottom of the screen, left intact:

  40 poke 53280,5:print "{grn}{clr}";:poke 53281,5
  45 for a=1 to 24:print "{rvon}{39 space}":next a
  55 for a=1063 to 2023 step 40:poke a,160:poke 54272+a,5:next a
  60 poke 53272,28:poke 53281,0

This is more expensive than the machine language; it’s costing us 900ms to fully clear the screen here. On the other hand, loading the machine language into place was expensive; our total time loss is more like half a second, not one, and this BASIC screen filler is forty times faster than our initial all-BASIC loop!

Not Copying the Character Set At All

Filling the screen in BASIC is a little indirect, but we can definitely do it. Moving the character set copy into BASIC, though, seems to be ruinous. Are we really obliged to rely on machine code to do this?

No. No, we are not. Not because we can do it any faster in BASIC, but instead because we do not need to rely on custom character sets at all. The C64’s sprite system is really quite convenient and I’ve already made good use of them in static displays already. Sprites are a lot more amenable to use in BASIC, and there’s a lot less memory we have to touch to use it.

First things first: we need a place to put the sprite definitions. Happily, there’s enough space in that cassette buffer we keep using to store three of those definitions—and with the machine code gone, we don’t need to worry about sharing the space, either. This also has the happy effect of putting all our hardware I/O buffers out of BASIC’s default memory span, so we don’t even need to reconfigure its memory any more.

Sprites are stored in 64-byte blocks, of which only the first 63 matter. The cassette buffer starts at memory location 828, but the available sprite addresses are 832, 896, and 960. We’ll be using the high-resolution monochrome mode for our umbrella, which in turn means that we need two of these three locations (one for the blue umbrella, and one for the orange handle). On the plus side, since they are 24×21 and our full image is only 16×16, we should be able to just stack both sprites in exactly the same place.

Except for the code that fills the screen with green squares, pretty much the entire code is different now. Configuring the sprites is a giant pile of bespoke POKEs in lines 140-170; for full details on these you can read my spritework overview or, honestly, consult almost any introductory C64 graphics tutorial. Here’s the final source:

 10 for a=832 to 958:poke a,0:next
 20 for a=832 to 877 step 3:read b:poke a,b:next
 30 for a=833 to 854 step 3:read b:poke a,b:next
 40 for a=921 to 942 step 3:read b:poke a,b:next
 50 poke 53280,5:print "{grn}{clr}";:poke 53281,5
 60 for a=1 to 24:print "{rvon}{39 space}":next a
 70 print "{rvon}{39 space}{home}"
 80 for a=1063 to 2023 step 40:poke a,160:poke 54272+a,5:next a
 90 poke 53272,22:poke 53281,0
100 print "{8 down}{6 rght}{grn}{rvon}{CBM-D}{rvof}{26 space}{rvon}{CBM-F}"
110 print "{6 rght}{6 space}{wht}BUMBERSHOOT SOFTWARE  "
120 print "{6 rght}{8 space}We adore our 64!    "
130 print "{6 rght}{grn}{rvon}{CBM-C}{rvof}{26 space}{rvon}{CBM-V}"
140 poke 2040,13:poke 2041,14
150 poke 53248,88:poke 53249,130:poke 53250,88:poke 53251,130
160 poke 53264,0:poke 53271,0:poke 53276,0:poke 53277,0
170 poke 53287,14:poke 53288,8:poke 53269,3
180 get a$:if a$="" then 180
190 poke 53269,0:poke 53280,14:print "{lblu}{clr}";:poke 53281,6:poke 53272,20:end
200 rem bumbershoot graphic
210 data 3,15,31,63,127,127,255,255
220 data 255,254,124,120,48,0,0,0
230 data 192,240,248,248,240,224,192,128
240 data 128,192,96,48,24,8,56,0

The final time spent is a bit over a second and a half, with about half of that spent on setting up the sprites and the rest spent on preparing the text screen. It’s slower than the machine code versions, of course, but the user will barely have time to notice the delay before the work is complete. I think we’re within parameters, and I’m comfortable declaring victory here.

Sudden Death Overtime

This is a significantly less reasonable program than the one we wrote for the ZX Spectrum, or even the TI and MSX versions. There’s a couple of things going on here, I think. The first is a phenomenon I noted when looking at the Exidy Sorcerer: simpler hardware can provide more intuitive controls. While the C64’s graphical capabilities are a strict superset of the Spectrum’s, those capabilities aren’t the default and aren’t very accessible by BASIC or even, honestly, in machine code. Juggling the necessary work is an ordeal, and the greater capabilities also generally place higher expectations upon the work itself.

It also doesn’t help that C64 BASIC offers no affordances whatsoever beyond being able to move the cursor around and set the foreground color; everything else is done via memory-mapped I/O. Every platform needed to do some POKEing to get the graphics working—the TI hides it a little, but we pass CALL CHAR a hex dump and there’s no mistaking what’s going on—but the C64 is doing very little else.

This was noticed at the time, of course, and over the years the C64 got a number of nonstandard extended BASICs to provide easier access to those capabilities. The Commodore 128’s BASIC actually included most of the facilities that had been standard on the PC since the VIC-20 days. As it is, though, I feel like the main thing we’ve proven here is that BASIC programs on the C64 almost have to be BASIC/ML hybrids.

The Spectrum can be pretty smug about all this; it ended up acquitting itself the best of all four platforms when it wasn’t even the hardware originally targeted by the display. Its BASIC implementation of this was more straightforward than even the TI version, where we were simply putting the default graphics mode through its paces using the commands that TI BASIC offered. It can’t get too smug, though, because unlike the other three it’s the only one without hardware sprites, and while the static displays are nice, smooth animation is much harder.

We’ll get there, before we’re done. But that goal is a ways off yet.