Video shuttle - Part three: Processing the video data
Material covered in this part
Processing the video data
Adding jitter
Adding low and high frequency video noise
Processing the video data
Now that we have most of the supporting functions that we'll need, we can move on to what is perhaps the most interesting part of this project, which is to actually implement the simulation of reading frames from a moving video tape.
Obviously we'll need two nested loops, one to iterate over the output frames, and another one to iterate over the lines of the image. The minimum code necessary to generate a useful output will therefore look something like this:
int outframe;
int tape_position_line_zero;
int current_output_line;
int top_frame;
int lower_frame;
int previous_input_frame;
previous_input_frame=-1;
for (outframe=0; outframe<total_output_frames; outframe++) {
if (total_output_frames>=10 && outframe%(total_output_frames/10)==0) { printf ("Reached frame: %04d\n",outframe); }
if (tape_position<0) { printf ("Tried to rewind before first frame.\n"); tape_position=0; tape_velocity=0; }
tape_position_line_zero=tape_position;
for (current_output_line=0; current_output_line<frame_height; current_output_line++) {
/* At this point, all of the output video frames have been written to disk. */
This code includes calls to a couple of functions that we haven't seen yet, specifically the function outline(), which will copy a single line from the input framebuffer to the output framebuffer, and new_tape_velocity(), which parses the guide file and returns a new value for tape_velocity.
However, the block of code above actually implements a very primitive version of the formula that we discussed in part one, so let's go over how it works before we look at those two other functions.
First, we set previous_input_frame to a dummy value of -1. This variable is used simply to see whether we need to load a different frame into the input framebuffer, or whether we are still processing the same frame as we did on the previous line, in which case we can avoid a call to read_input_frame. Obviously when we start there is nothing already in the framebuffer, and we will need to load the first input frame into it, so for this reason we set the value to something other than our starting output frame, which will be frame 0000.
Next we start the outer loop using outframe to iterate over 0 to total_output_frames-1. As this program can run quite slowly on older hardware or when processing a lot of large image files, we output a diagnostic message to the console every time we complete about 10% of the overall process. Note the check for total_output_frames being greater than or equal to ten. This is nothing to do with the simulation as such, and is purely to avoid a modulo zero error in the following expression, where we calculate whether we have reached a round 10% mark or not before printing the diagnosic message.
After all this time, we finally come to the first real piece of code that is actually something to do with the simulation itself. We check for having tried to rewind the tape before position zero, and if we detect that we have, then we just set both tape position and tape velocity to zero, effectively pausing things at the beginning of the input.
In fact, the code as written does mostly handle negative tape positions, but this was never an intended feature and offers no obvious benefits.
We store the value of tape_position before entering the inner loop that iterates over the lines, as we will need to know the starting and ending positions when we come to process the matching segment of audio data. However, it's also useful for producing diagnostic output.
The inner loop is where all of the really interesting video processing is going to be done. For now it's very basic, but already we can see our original formula for calculating the tape positions:
Since all of the variables we are using are integer variables, this expression will be evaluated as an integer and the result will simply be rounded down. Nevertheless, this is already sufficient to create an interesting effect. Although we won't yet generate any noise bars, when moving at several times normal playback speed, the output frames will be constructed from several different input frames, creating a tearing pattern.
We store the number of the input frame used to read the top and bottom lines purely so that we can print this information on the console later. These values are not otherwise needed, but they were very useful in early development before I had the graphics functions implemented. Yes, I did a lot of the early development just looking at lists of numbers, with no actual graphical output:
Guide file
0030
30
+010
0000
**
.....>>>>>>>>>>>>>>>>>>>>.....
Program output
For output frame 0, input frames are from 0 - 0, and tape speed is 10/10
For output frame 1, input frames are from 1 - 1, and tape speed is 10/10
For output frame 2, input frames are from 2 - 2, and tape speed is 10/10
For output frame 3, input frames are from 3 - 3, and tape speed is 10/10
For output frame 4, input frames are from 4 - 4, and tape speed is 10/10
For output frame 5, input frames are from 5 - 5, and tape speed is 10/10
For output frame 6, input frames are from 6 - 6, and tape speed is 11/10
For output frame 7, input frames are from 7 - 7, and tape speed is 12/10
For output frame 8, input frames are from 8 - 8, and tape speed is 13/10
For output frame 9, input frames are from 9 - 9, and tape speed is 14/10
For output frame 10, input frames are from 11 - 11, and tape speed is 15/10
For output frame 11, input frames are from 12 - 13, and tape speed is 16/10
For output frame 12, input frames are from 14 - 14, and tape speed is 17/10
For output frame 13, input frames are from 15 - 16, and tape speed is 18/10
For output frame 14, input frames are from 17 - 18, and tape speed is 19/10
For output frame 15, input frames are from 19 - 20, and tape speed is 20/10
For output frame 16, input frames are from 21 - 22, and tape speed is 21/10
For output frame 17, input frames are from 23 - 24, and tape speed is 22/10
For output frame 18, input frames are from 25 - 27, and tape speed is 23/10
For output frame 19, input frames are from 28 - 29, and tape speed is 24/10
For output frame 20, input frames are from 30 - 31, and tape speed is 25/10
For output frame 21, input frames are from 33 - 34, and tape speed is 26/10
For output frame 22, input frames are from 35 - 37, and tape speed is 27/10
For output frame 23, input frames are from 38 - 40, and tape speed is 28/10
For output frame 24, input frames are from 41 - 42, and tape speed is 29/10
For output frame 25, input frames are from 44 - 45, and tape speed is 30/10
For output frame 26, input frames are from 47 - 48, and tape speed is 30/10
For output frame 27, input frames are from 50 - 51, and tape speed is 30/10
For output frame 28, input frames are from 53 - 54, and tape speed is 30/10
For output frame 29, input frames are from 56 - 57, and tape speed is 30/10
It's accurately describing the tape motion as we hoped, but text output isn't much fun so let's look at the two remaining functions, outline(), which we will need for graphical output, and new_tape_velocity(), that we need to parse the guide file.
The outline function
This function copies a single line of pixel data from the input framebuffer to the output framebuffer, transforming it in various ways depending on the parameters supplied. This is where the actual noise is inserted into the image as required.
/* Copy a single line of pixel data from the input framebuffer to the output framebuffer. */
/* If the value of noise supplied is above 50, output either a solid bar or random noise, depending on the value of flag randomnoise supplied. */
/* If randomnoise is set to 0, the noise bars are rendered as solid. If set to 1, they are rendered as noise in a way that visually resembles tape noise. */
/* Add a slight line jitter to each line, based on the value of add_line_jitter, which can be zero, to simulate the effect of uncorrected timebase errors. */
int outline(unsigned char * framebuffer_in, unsigned char * framebuffer_out, int frame_width, int frame_height, int line, int noise, int add_line_jitter,
There are quite a few things going on here, most of which will not be activated by the code we currently have in the main function as we have hard-coded values of 0 for the parameters noise and add_line_jitter.
Basically, this function can potentially change the source image in three ways:
Firstly, we can add a small horizontal offset to each line, a different random offset for each line. This causes a subtle shimmering of the image, and is a simple simulation of uncorrected timebase errors. This is intended to be activated increasingly at tape speeds above that of normal playback.
Secondly, and perhaps most obviously visually, we can replace an entire line with either a white bar or low frequency random noise.
Finally, we can add a small amount of high-frequency noise on top of the video image. The idea is to completely replace entire lines with low-frequency noise at points where we wouldn't be reading a signal at all, and to add subtle high-frequency noise on top of lines where we would be reading a weak signal.
Types of noise:
It's important to use low-frequency noise for this, as otherwise the result looks completely artificial and unlike real tape noise:
Low-frequency noise more closely resembles tape noise displayed by a domestic VCR...
...When compared to simple random digital pixel values like this.
We start by setting an initial value for noisebyte. This variable holds the value that we will write out if we are replacing the line with low-frequency noise. To generate low-frequency noise, we simply add or subtract a small random value from the previous value, so that overall it can vary between 0 and 255, but not flip from one extreme to the other across a single pixel. The expression arc4random_uniform(33)-16 will evaluate to a random number between -16 and +16, and these values were found by experimentation to yield nice looking results.
If the supplied parameter randomnoise is set to zero, we simply use the constant value 192 in place of random values, producing a light grey bar. This is more typical of professional video equipment, so the visual effect might admittedly look out of place to anybody who is only familiar with domestic VCRs.
The line jitter is simply a random number between the values of -(add_line_jitter/2) and +(add_line_jitter/2), and is just added to or subtracted from the memory location that we read from the input framebuffer, (after being multipled by three, to account for the fact that each pixel is stored as three bytes). All this happens in the macro defined at the beginning of the function, SOURCE_BYTE_ADDR, which is nothing more than the address in the input framebuffer that we want to copy a particular pixel from.
If the value of linejitter would take us beyond the limits of the input framebuffer, we simply return an RGB pixel value of zero. This does mean that with line jitter enabled, the edges of the frame for all but the first and last lines may contain pixel data from the previous or next line, which is technically inaccurate and wouldn't happen in a real analogue VCR. This could be avoided, but since the visual effect is not displeasing it doesn't seem very important.
When the supplied value for the noise parameter is greater than 25, the line is replaced with low-frequency noise. When the supplied value is 25 or below, the line is output as normal, but with the line jitter as mentioned above, and a degree of high-frequency noise calculated as a random number between 0 and (noise-1). This range only includes positive integers, unlike the other random numbers we have generated in this function, and this is deliberate, as we want the noise to be more visible in dark areas of the image. We calculate the maximum noise possible that we can add without overflowing the maxium single byte value of 255, and if the random value generated would take us beyond that, we just clip it and apply the maximum. Statistically, this means that already very light pixels will be affected to a lesser degree, and this creates a nicer visual effect.
Again, technically it would probably be more correct to add the same high frequency noise to each of the red, green, and blue channels for each pixel, but the images generated by treating each channel separately are visually pleasing, so I decided to leave it that way.
Essentially, the core of this function is simply a loop writing these calculated values to the output framebuffer.
So now we have a function to actually add the noise to the image, the next question is how to calculate where the noise bars actually fall, and their widths. This we will do back in the main function, but first, we should look at the new_tape_velocity function as we need this to actually control the simulated tape motion.
Simulating the tape motion
This should be fairly self-explanatory. We're just changing the value of tape_velocity based on the ASCII character read from the guide file, and returning the new value to the calling function.
/* Adjust the velocity of the simulated tape, accelerate or decelerate based on characters in the supplied text string. */
/* Call with current frame, current tape velocity, and the controlling text string. */
/* Returns the new tape velocity. */
int new_tape_velocity(int outframe, int tape_velocity, unsigned char * guide)
{
if (*(guide+21+outframe)=='>') { tape_velocity++; }
if (*(guide+21+outframe)=='<') { tape_velocity--; }
if (*(guide+21+outframe)==']') { tape_velocity+=2; }
if (*(guide+21+outframe)=='[') { tape_velocity-=2; }
if (*(guide+21+outframe)=='}') { tape_velocity+=3; }
if (*(guide+21+outframe)=='{') { tape_velocity-=3; }
return (tape_velocity);
}
Back to the main function
Now that we know how the outline function works, all that remains to get the video output we want is to calculate appropriate values for the noise and add_line_jitter parameters, which are currently hard-coded to zero.
Frame jitter vs line jitter
An important note: in the main function we will shortly introduce another 'frame jitter' value, but this is completely different and nothing to do with the line jitter in the outline function. They are two distinct concepts, be careful not to confuse them.
When we originally discussed the formula for calculating the current frame number from the tape position, we noted that the fractional part indicated the offset of the video head from the correct alignment over the track. In the first version of the code in the main function presented above, we just performed an integer division and discarded the remainder. This had the effect of keeping us on the current frame until we were completely aligned with the next one. Note that I'm talking here about the different frames read during one sweep of the head, that is to say, when the tape is moving forwards at high speed, and one output frame will consist of fragments of several input frames. I'm not talking about simple playback at normal speed, because of course in that situation we will always read the start of the next frame after finishing the current one.
The effect of the integer division was to create the torn effect of one output frame being made up from several input frames, but the transition was abrupt. What we now need to do is to process the fractional part of the division, or the remainder:
for (outframe=0; outframe<total_output_frames; outframe++) {
/* Process video first */
if (total_output_frames>=10 && outframe%(total_output_frames/10)==0) { printf ("Reached frame: %04d\n",outframe); }
if (tape_position<0) { printf ("Tried to rewind before first frame.\n"); tape_position=0; tape_velocity=0; }
tape_position_line_zero=tape_position;
for (current_output_line=0; current_output_line<frame_height; current_output_line++) {
And this is the output at frame 24, using the internal color bar generator and image dimensions of 960×540 pixels:
Not at all bad. We can clearly see the noise bars, as well as the effects of the line jitter and the high frequency noise just above and below each noise bar, which is more visible over the darker colors to the right. It's good so far, but we can make some further improvements.
Firstly, notice that the top and bottom of each noise bar has a very sharp and distinct edge. That's not particularly accurate, because the signal doesn't just suddenly become too weak to read at a very precise and well defined point, it gradually fades as the tracking of the heads becomes less accurate. At the transition point the weak signal may or may not 'break through' the noise. This effect would be particularly noticeable when the tape is paused, and we see consecutive readings of the same frame. In this case, the edges of the noise bars jitter somewhat and there may be lines near the edges which do actually read a signal.
To implement this subtlety, we just add a slight jitter to the tape position:
You have to look carefully to see the difference, but it's there, and it improves the visual effect.
Comparing the two images side by side, you can clearly see that in the right hand image the green and magenta color bars are visible through the noise for a few lines at the top and bottom of the noise bar, whereas on the left they are not.
Once again, the exact values used in the calculations of noise levels, and jitter, were mostly found by experimentation. They are intentionally scaled by the frame height and the global scale factor at certain points in an attempt to keep the effect constant across different sizes of input, and different scale factors. If you process input images at a resolution of, for example, 1920×1080, and then resize the input images to 960×540 or even change the aspect ratio by resizing them to, say, 800×600, then the visual effect of the noise bars, principally their width, should remain fairly constant if you use the above formulas unchanged.
The last thing to add before we move on to the audio, is a frame locking effect when returning the tape to normal speed after a shuttle. In a real VCR, just because the tape is moving at the correct forward playback speed, it doesn't necessarily mean that the heads are aligned with the centre of the video tracks. Without any effort to track the signal, the heads could happily be reading noise from the inter-track guard bands for the entire duration of the playback. This is precisely what happens if the tracking adjustment is incorrectly set. What should happen is that once the tape has been running for a few frames at normal speed, the sync pulses recorded on the tape should be detected, and a very small change applied to the tape speed or head rotation to bring everything into sync. This is often visible on-screen as two or three fields almost competely full of noise, when quickly clear to leave a good picture.
We can easily simulate this, just by adding a few lines to count the number of frames played at normal speed, introduce some noise if it's between 1 and 3, and thereafter keep the offset parameter at zero whilst we continue playing at normal speed:
/* Count the number of frames that we are running at normal speed, so we can ensure that we sync up after a few */
if (tape_velocity==SCALE) { frame_lock_count++; } else { frame_lock_count=0; }
for (current_output_line=0; current_output_line<frame_height; current_output_line++) {
/* Note the current frame when processing the first and last line, to print debug info later on. */
if (current_output_line==0) { top_frame=frame; }
if (current_output_line==(frame_height-1)) { lower_frame=frame; }
if (offset>HALF_FRAME_HEIGHT) {
offset=offset-frame_height;
frame++;
} else {
if (offset<-HALF_FRAME_HEIGHT) { offset=offset+frame_height; frame--; }
}
if (frame_lock_count>0) { offset=0; }
if (frame_lock_count>0 && frame_lock_count<4) { offset=arc4random_uniform(2*frame_width/(5+frame_lock_count)); }
Technically, it would be more accurate to adjust the tape position, rather than simply set the offset to zero, but the only difference that would make would be to the exact position of the noise bars that appeared when the tape went back into a shuttle operation. Since we're not trying to accurately simulate any particular video recording format anyway, this point seems moot.
Now that we have the video side of the effect sorted out, we just need to process the audio to go with it.
Summary so far
In this part, we've actually implemented the formula that we originally worked out and seen that it does indeed produce realistic-looking noise bars. We've made a few improvements, and introduced some more randomness to the way the bars are drawn, in the form of line and frame jitter, as well as adding some high frequency noise at the edges of the noise bars.
In the final part of this project, we'll see how to process the audio data.