Introduction

For compatability with existing Spectrum software, the SAM Coupé shares a number of hardware features with the Spectrum. These include a common video mode (SAM's mode 1) and I/O ports (keyboard, border, attr), allowing most Spectrum software to run unchanged.

A key difference between the machines is that of CPU speed: the original Spectrum has a 3.5MHz Z80A, and the SAM a faster 6MHz Z80B. This speed difference could easily break timing sensitive code, causing problems such as unexpected interrupt re-entrancy. For this reason, the SAM was designed to run at a reduced speed in mode 1.

The SAM technical manual doesn't cover the speed reduction, so a test was needed to determine exactly where the delays occurred. My curiosity was driven by a desire to implement the feature in SimCoupe, which would need a precise understanding of the effect.

The Code

A few quick tests in the upper border area confirmed delays present only in mode 1. By monitoring the timing of code across the display width, it would be possible to determine where the delays were in effect.

The following code fragment is designed to be executed in the border area just above the main screen, positioned using a fixed delay from the frame interrupt:

        ld  de,#0002   ; 00 = black, 02 = red
        ld  a,e        ; red

                       ; -- NOPs added here --

        out (254),a    ; border red
        ld  a,d        ; black
        out (254),a    ; border black

Performing the test in the border area prevents memory contention in the main display area affecting the results. This contention is between the CPU and the ASIC, as they both attempt to access memory at the same time. Drawing the display is time-critical, so the ASIC has priority over the CPU, delaying the CPU by 4 tstates.

For each run of the test code, the start position and length of the red border bar are noted. An extra NOP is then added to the code position indicated above, and the test is repeated. This is continued until the bar advances up to the right border, by which point we should have enough sample data.

To be thorough, the test was repeated on different border lines, to ensure the delays were the same on each. With this confirmed, it was time to work on the results.

Results

The diagram below shows how the bar position and length changed with each NOP added:

Test results

Each block above represents a single display 'cell', which is an indivisible row of 8 pixels (as far as the ASIC is concerned, this also applies to the border area). Each cell is 8 tstates in length, which is two Z80 machine cycles.

The values at the top show the screen position relative to the start of a TV scan-line. The full range is 0 to 47, but not all of the left and right border areas can be seen on a standard TV. I've included only the cells visible on my own TV, and as used by the test.

Even at this stage a definite pattern can be seen - at 8 cell intervals it alternates between 2 and 4 cells in length. The start position also advances one cell per test when the length is 4, or every two tests when the length is 2.

Timings

To explain the difference in bar length, we need to examine the timing of the code that determines the length. i.e. the code between the two OUT instructions that form the bar.

In the fastest possible case, with no additional delays, we have:

        out (n),a
                    ; -- border bar visible --

        ld  a,d     ; opcode read    4
        out (n),a   ; opcode read    4
                    ; operand read   4
                    ; port write     4

                    ; -- border bar hidden --

This gives a total of 16 tstates for the bar length. With the SAM display taking 384 tstates per line of 48 cells, the 16 tstate block represents 2 screen cells. This matches the minimum bar length we see, so no delays are present during this time.

In the slowest possible case, with delays affecting all memory and I/O accesses, we have:

        out (n),a
                    ; -- border bar visible --

                    ; delay          4
        ld  a,d     ; opcode read    4
                    ; delay          4
        out (n),a   ; opcode read    4
                    ; delay          4
                    ; operand read   4
                    ; delay          4
                    ; port write     4

                    ; -- border bar hidden --

This gives a total of 32 tstates, or 4 cells in length - this also matches the maximum bar length, suggesting the delays apply over the whole area for bar lengths of 4.

Analysis

Using the timing information above, we can start to form a picture of the timings across the width of the display. We start by marking the known border change positions, and work to fill the gap between them.

Accessing some I/O ports requires the attention of the ASIC to satisfy the request. Ports 240 to 255 fall into this range, and incur a 4 tstate delay if not aligned to the last 4 tstates in an 8 tstate cell. Writing to the border port (254) is affected by this, so we know all our port write must occur in the 4 tstate block immediately before the change in border colour.

We already know the details for bar lengths of 2 and 4, as calculated in the previous section. These can be entered in the table below as "lonp" and "XlXoXnXp", respectively (see the key following the table). For each 'X' present in the table (except before 'p'), we can enter an 'X' in the same column for any test that shares the column. The 'p' case is an exception because the delay before it may simply be due to normal ASIC delay for the the border write.

This leaves us with just a few gaps, most of which are obvious from the remaining operations that have yet to be entered for each test.

The completed table is shown below:

Test results with timings

Key:
    X = ASIC delay
    l = ld a,d opcode read
    o = out (n),a opcode read
    n = out (n),a operand read (port number)
    p = out (n),a port write (border)
  

The table reveals a pattern in the columns affected by the delays: 8 columns from 7 to 14, then 8 columns from 23 to 30, and the start of another block at column 39. These alternating blocks of 8 cells show the mode 1 slow-down in action!

Why 7 to 14 instead of 8 to 15? Well, the ASIC works one cell ahead of the display as seen, since display data must be pre-fetched ahead of it being used. It makes sense that the mode 1 delays are tied into the same logic as the display, and also happen one block early.

Extrapolating from what we have above, we can predict the likely behaviour in the off-screen border areas that couldn't be tested. To test this out, the new delays were implemented in SimCoupe, and compared against a real SAM running some general timing tests. Everything matched up, as we'd hoped - we'd achieved our original goal!

Conclusion

The ASIC uses forced delays in 8-cell blocks, alternating every 8 cells across the full width of the display.

This is still only half the overall picture for mode 1, as we must consider the contention delays for the main screen area. To complete the picture we simply merge both delay patterns:

Basic contention pattern Overall mode 1 contention pattern
Contention delays in all display modes Complete mode 1 contention

If you have any questions or comments, please contact me using the address below.