Loading bars simulator - Part one: Background information and theory
Material covered in this part
Background information and theory
ZX spectrum home computer
Video signals
Audio signals and tape format
Screen memory layout
Background information and theory
In this project, we'll create another video effect as a sequence of ppm files. This time we'll add audio into the mix too, and see how to write an audio data file to go with the video. The effects are not particularly complicated, but unlike the starfield simulator they quite strictly defined, and require us to work out how to express various arbitrary formulas as C code in order to create them.
The specific effect we're interested in is a tape loading sequence from a 1980s home computer, which may or may not be familiar to you already. Don't worry if it isn't, as we will explain it from first principles below.
Software for home computers in the 1980s was frequently distributed on audio cassettes, stored on the tape as simple audio waveforms which the computer interpreted as binary data when played back, broadly similar to the way that facsimile machines and analogue modems communicate via an analogue telephone line. The technology was extremely primitive by today's standards, with limited error detection and almost never any form of error correction. In most cases, the audio data was simply directly modulated from the binary data, one bit at a time, with a short synchronizing tone at the start of each block.
In order to provide some kind of feedback to the user that something was happening, the computers of the time would usually create some kind of visual display. This could range from a simple textual message saying that a program had been found on the tape, through various sorts of graphical effects. Less commonly, audio effects, (in addition to the tones recorded on the actual tape), were also used as an indicator of data loading.
All of these audio signal data encodings that were recorded to tape had a very characteristic sound. Experienced users of these machines could learn to interpret the sound from the tape by ear, knowing which type of machine the tape was recorded on, the type of data stored, (all binary zeros, bitmap data, executable code, and so forth), and whether the tape was in good condition or not. The visual display whilst the machine was loading the data from tape was also very characteristic, and as we will see below, in the case of the ZX Spectrum, especially so when data was being loaded into the video framebuffer memory.
ZX Spectrum home computer
The ZX Spectrum was a hugely successful British home computer during the 1980s, popular not only in it's native country but across Europe, Russia, and even parts of South America. To this day, it still enjoys a cult following, and a wealth of information can be found about it on-line.
Due to an interesting design of the video hardware, writing data to the framebuffer in a linear fashion doesn't result in pixel data appearing on sequential lines starting at the top of the display, but instead in a kind of interlaced pattern, the logic of which becomes very clear as soon as you look at the numbers involved expressed in binary. In fact, it's to do with the limitations of the DRAM of the day, but we'll talk more about the screen layout shortly.
In case you've never seen this effect before, I've created an approximation using SVG and SMIL for animation that you can see .
However, this is only an approximation, (SVG and SMIL have their limits!), for readers who are completely unfamiliar with the subject. Most significantly, the effect is intentionally being reproduced about four times as quickly as it should be, so that you don't have to spend 40 seconds or so watching it.
Also, although the yellow and blue lines would be uniform during the loading of the black pixel data, they would be of varying width when the green and magenta stipes of the attribute data is being loaded at the end. Doing this in SVG and SMIL would have been prohibitively tedious, and made the SVG source code considerably larger, but the simulation we create in C will be much more authentic and faithful to the original.
Except that we won't just stop there...
Why bother?
Although the standard effect can probably be re-created faithfully on a modern computer by some ZX Spectrum emulators, we can do it ourselves in just about 500 lines of C. Along the way, we'll learn about video timings, see how to implement arbitrary algorithms in C, and after creating an authentic simulation of the real loading sequence, we'll use a bit of artistic license to create a new and interesting effect that you couldn't create with the real hardware.
This is another great example of writing C code to simulate something in the real world, that you can't just download a free software library for!
Video signals
Fundamental to re-creating this visual effect in a C program, is understanding where it comes from in the beginning. For that, we need to understand a bit about television video signal timings.
Like most home computers of the 1980s, the ZX Spectrum was designed to be connected to a conventional television set rather than a dedicated computer monitor. Being a British computer, it was designed to work with the most common standard definition television system popular across Europe at the time. Commonly and incorrectly called PAL, (which only refers to the color encoding and not the video timings), the broadcast standard was 625 scanning lines in two interlaced fields, at 50 fields per second, giving 25 frames per second. If you're not familiar with this terminology, don't worry. Things are actually simpler than they sound, as the spectrum didn't output a fully standards compliant signal anyway. Instead of 50 fields of 312.5 lines each, producing the 625 line frames of broadcast television, we just get 50 full frames of 312 lines each. No interlace, just a plain 312 line raster, 50 times per second. Nice and simple.
These numbers may be slightly uncomfortable to work with for to readers familiar with the NTSC standard, (which actually does define video timings as well as color encoding), and since the frame rate doesn't match up with the common 60hz and 72hz settings found on modern computer monitors, it's basically impossible to create an entirely faithful reproduction of the effect on these displays. However, we can certainly create an accurate rendering of each frame in a set of ppm files.
So the video display produced by the computer is simply drawn 50 times per second, starting from the top left and progressing to the bottom right, just as a modern PC video card generates it's display, except at a lower resolution and a lower frame rate. This is a continuous process, however there are brief periods at the beginning of each line where the video signal is not displayed, known as the horizontal blanking periods. The first few scanning lines of each frame are also not displayed, and this is known as the vertical blanking period. We need to take these parts of the signal into account when we create our simulation, otherwise the width and positioning of the bars in the border area will be wrong, and the effect will not look authentic.
Each frame of the display generated by the ZX Spectrum consists of two areas. The central area has it's contents provided by a framebuffer in RAM, the details of which we will discuss shortly, but for now we can simply note that it's dimensions are 256 × 192 pixels. The outer area of the display, covering the rest of the screen, is a border area. This can be set to one of eight colors, controlled by a single byte sent to a specific I/O port.
Given that the video signal generation is a continuous line-by-line process, changing the value of the border area faster than the 50 Hz vertical refresh rate creates a pattern of horizontal bars in the border area, and this is exactly what happens during the tape loading, (or saving), procedure - the border color of the screen at any one moment simply corresponds to the relative phase of the audio waveform read from or written to the tape.
Given that the frequencies of the video signals are much higher than those of the audio signals coming from the tape that the computer is responding to when loading, it makes sense that a single 'pulse' of audio data from tape will spread out over a comparatively large area of the screen, as the video signal has covered several scanning lines during that time interval. It's also logical to assume that the frequencies of the audio and video signals will not be round multiples of each other, and so the beginning and end of the snatch of audio data that corresponds to one video frame, most likely won't align in any way with that of the next video frame, and so on.
During the constant lead-in signal, as the square wave on the tape flips it's polarity, so the video signal alternates between red and cyan. The bars are a uniform width, as the signal from tape is constant, but they move on-screen, as the time taken to scan one video frame, 20 milliseconds, doesn't divide equally into the frequency of the audio tone, which is about 807 Hz.
It is precisely this principle that gives us the somewhat uniform but varying colored bars.
Audio signals and tape format
Next, we need to understand the format of audio data written to tape, as the raster bar video effect is essentially just a visual representation of the same waveforms.
Something which may come as a surprise to younger readers, is that no special hardware is involved in the generation of the audio waveforms. It's simply a square wave, created by toggling an I/O port of the Z80 cpu. If we really wanted to calculate the timings from first principles, we could do this fairly easily by looking at the ROM code for the tape saving routines. Both the routines and the Z80 cpu are very simple by today's standards, and each machine-code instruction takes a known number of clock cycles to execute, so it's not difficult to calculate the specifications of the waveform that would be generated.
However, this information is also trivial to find on-line these days, and was even documented in various books and magazines back when the machine was still enjoying it's heyday. So we can avoid the need to delve into Z80 assembler at this moment to get the information that we need.
The timings are commonly expressed in a unit of ‘T-states’, which is nothing more than one clock pulse of the Z80. In the ZX Spectrum, the Z80 is clocked at 3.5 Mhz, so there are nominally 3,500,000 T-states in one second. We will need to convert the values in T-states into other intervals that are more useful to us when writing the simulator, but for now we'll describe the signal in terms of them.
By default, user data consisted of two blocks, a short header block of 17 bytes, containing things such the filename and length of the data, then a second block with the actual user data itself.
Each block of data consists of three sections, a lead-in or pilot tone, a sync pulse, and then the actual data. The header and user data are extended by two bytes, a flag byte at the beginning, and a checksum byte at the end.
The lead-in pulses last for 2168 T-states, so 2168 T-states on, 2168 T-states off, and so forth. The sync pulse is 667 T-states followed by 735 T-states. The actual data bits are each two pulses, of either 855 T-states for a binary 0 or 1710 T-states for a binary 1.
Immediately we can see that the symbol rate is not constant. Writing binary 0s is twice as fast as writing binary 1s. So we cannot just calculate the state of the audio waveform or the video display at any particular instant in time, without knowing what went before it.
Note, too, that the absolute polarity of the signal at any moment doesn't matter either. The waveform can be completely inverted and still remain valid, what matters is the timing of the rising and falling edges of the square wave.
Where to begin creating a model for all this
Now that we know how the effect is generated and where it comes from, we can see how we will need to approach simulating it. All we need to do is:
Step 1
Read arbitrary data from a file
Simulate the scanning of the television signal and compute it's correct color at every point based on this data
Convert the input data into the same audio frequency signal representation as used by the ZX Spectrum
The above will get us a faithful model of the raster bars, but to complete the screen loading effect we need two more elements:
Step 2
In place of arbitrary data, prepare data corresponding to an image in the format of the ZX Spectrum framebuffer
Set pixels in the central 'framebuffer' area at the time when data for that pixel would be being read from the tape
Finally, to make a particularly interesting and novel video effect we need two more elements, which we will explore in more detail later on:
Step 3
Overlay the monochrome data with a 24-bit bitmap at the point where 15-color attribute data would normally be loaded
Create fake but authentic-sounding audio data for this time period, given that 24-bit RGB color can't be represented by the standard data format
Screen memory layout
The last thing we need to familiarize ourselves with before writing the simulator, is the layout of the video memory on the ZX Spectrum.
As mentioned earlier, the display dimensions are 256 pixels × 192 pixels, and it is essentially a 1-bit framebuffer, with each pixel being on or off, represented by a single bit. The color part of the display is implemented as a lower resolution array of 'attribute' bytes, effectively a color palette for each block of 8 × 8 pixels. So whilst each pixel can only be set to one of two states, the colors represented by those two states can be defined independently for each block of 8 × 8 pixels.
In a linear framebuffer, the pixel data for each line would be stored sequentially in memory, in the same way that the video hardware reads it out. In the ZX Spectrum, the layout of the framebuffer is not linear, but the only difference is that bits 5-7 are swapped with bits 8-10. That's all.
This has two effects, firstly, consecutive lines of the display are stored in diverse memory locations, which was useful to overcome speed limitations of the RAM chips available in the 1980s, and secondly, loading data sequentially into the video RAM has the effect of drawing pixel data on the screen row by row, but with those rows spread out by eight pixels vertically, as approximated in the animation above.
The monochrome framebuffer data is stored in the first 6144 bytes of video RAM, followed by the 768 bytes of attribute data. This is why loading data into the framebuffer RAM from tape produces a monochrome image initially, and then overlays the color.
The three lowest bits of each attribute bytes contain the color to be produced for any bits in the framebuffer set to 1. Bit 0 controls blue, bit 1 controls red, and bit 2 controls green. Bits 3-5 of the attribute byte contain the color index for any 0 bits in the framebuffer, with the same blue, red and green mappings. Bits six and seven of the attribute byte have other functions. If bit 6 is set, the brightness of the two colors already defined is increased, enabling a second 8-color palette. However, both colors for each block of 8 × 8 pixels obviously have to come from the same palette, either normal or 'bright'. Bit seven, if set, enables a hardware, 'flash', whereby the two colors defined in the lower six bits would be repeatedly swapped back and forth, entirely in hardware.
Summary so far
That's it for the introduction and preamble. Now we have all of the information that we need to start writing the simulator, which we'll do in part two.