EXOTIC SILICON
“Programming in pure ‘c’ until the sun goes down”
Video shuttle - Part four: Processing the audio data
Material covered in this part
Processing the audio data
Processing the audio data might seem simple in comparison with implementing the video effect, but there are still some important details that we need to consider.
After calculating each frame of the output video, we need to create the section of audio that corresponds to it. To do this, we need to compute the correct values for the audio samples between: (outframe/fps*sample_rate) and ((outframe+1)/fps*sample_rate)-1. For example, assuming 30 fps and a sample rate of 48000, frame 0 will be: (0/30*48000) to (1/30*48000)-1, or samples 0 - 1599.
The input values need to come from: (tape_position_line_zero/SCALE*sample_rate/frame_height/fps) to ((tape_position_last_line/SCALE*sample_rate/frame_height/fps)-1). This will output the original audio unchanged when playback is at normal speed.
We can be reading the simulated tape forwards, or backwards, and at either a faster or slower speed than normal. That gives us a total of four different scenarios that we need to handle. The code also needs to handle the case where we don't have any input audio, which we do by creating a sawtooth waveform, at 1 Khz multiplied by the current tape speeed.
When the tape is stationary, we have a choice. We can output silence, which would be a faithful simulation of audio being reproduced from tracks recorded linearly along the edge of the tape, as when the tape is stationary there is obviously no signal to be read. Alternatively, we can repeatedly output the same one frame's worth of audio samples at normal speed. This creates a stuttering sound, as might be heard from replaying audio from tracks recorded helically across the tape in the same way that the video is recorded, which was a common feature on high-end domestic VCRs, as well as some professional equipment.
Linear vs helical audio recording
If you're wondering why the apparently higher quality helical audio recording method isn't universally used on professional videotape machines, there are basically two reasons.
Firstly, since the linear tape speed in professional videotape formats is usually much higher than it is in any domestic format, the quality of the linear audio tracks is correspondingly higher anyway, reducing the benefit.
Secondly, the linear tracks are fully editable, allowing dubs and inserts of new audio, something which is difficult, (although not impossible), with audio tracks recorded in helical scan.
If the tape is moving backwards, then sample_from will be a larger value than sample_to. To handle this, we either need to code two separate loops, one to copy the data in each direction, or alternatively code a single loop with a conditional increment or decrement of the counter. Since this is a programming tutorial, we'll use both approaches at different places the code.
If the tape is moving at a faster than normal speed, then there will be more input audio samples than output audio samples. Instead of doing any sort of interpolation, we'll just copy the value of the nearest input sample at each required point, as we iterate over the output samples.
On the other hand, if the tape is moving at a slower than normal speed, then there will be fewer input audio samples than output audio samples. In this case, we iterate over the input samples, and simply duplicate them as required in the output.
I originally implemented these very simplistic ways of re-sampling the audio as placeholders for code implementing a more complex approach. However, in practice, the audio quality seems perfect fine for this application. Considering that audio reproduced at the high speeds typical of a video shuttle effect is going to be fairly unintelligible anyway, the fact that there are some resampling artifacts on it probably doesn't matter very much.
The audio processing code looks like this:
/* printf ("For output frame %d, tape position is from %ld - %ld\n", outframe, tape_position_line_zero, tape_position_last_line); */
if (flag_stutter_on_pause==1) {
if (tape_position_line_zero==tape_position_last_line && tape_velocity==0) {
/* printf ("Multi-frame pause at %d\n",frame); */
tape_position_last_line=tape_position_last_line+(SCALE*frame_height)-1;
}
if (tape_position_line_zero==tape_position_last_line && tape_velocity!=0) {
/* printf ("Single-frame pause or end of multi-frame pause at %d\n",frame); */
}
}
sample_from=((tape_position_line_zero/SCALE)*sample_rate/frame_height/fps);
sample_to=(((tape_position_last_line/SCALE)*sample_rate/frame_height/fps)-1);
if (sample_to<0) { sample_to=0; }
if (sample_to>audio_in_total_samples-1) { sample_to=audio_in_total_samples-1; }
if (sample_from<0) { sample_from=0; }
if (sample_from>audio_in_total_samples-1) { sample_from=audio_in_total_samples-1; }
/* printf ("For output frame %d, input audio samples are from %ld - %ld.\n\n", outframe, sample_from, sample_to); */
if (labs(sample_from-sample_to)<(samples_per_frame-1)) {
/* Fewer than samples_per_frame input samples, so we need to duplicate or interpolate them to the output. */
if (sample_from>sample_to) {
/* Audio is being reversed at < 1x normal speed. */
for (a=0; a<samples_per_frame; a++) {
/* printf ("Out sample %d, from in %ld\n",a,sample_from-(labs(sample_from-sample_to)*a/samples_per_frame)); */
if (audiobuffer_in!=NULL) {
for (sample_subbyte=0; sample_subbyte<bytes_per_sample; sample_subbyte++) {
*(sample_subbyte+audiobuffer_out+bytes_per_sample*(samples_per_frame*outframe+a))=
*(sample_subbyte+audiobuffer_in+(sample_from-labs(sample_from-sample_to)*a/samples_per_frame)*bytes_per_sample);
}
} else {
/* No input audio supplied, so we generate a sawtooth wave */
*(audiobuffer_out+bytes_per_sample*(samples_per_frame*outframe+a))=
((((sample_from+labs(sample_from-sample_to)*a/samples_per_frame))%48)-24)/4;
}
}
} else {
/* Audio is going in the forward direction at < 1x normal speed. */
for (a=0; a<samples_per_frame; a++) {
/* printf ("Out sample %d, from in %ld\n",a,sample_from+(labs(sample_from-sample_to)*a/samples_per_frame)); */
if (audiobuffer_in!=NULL) {
for (sample_subbyte=0; sample_subbyte<bytes_per_sample; sample_subbyte++) {
*(sample_subbyte+audiobuffer_out+bytes_per_sample*(samples_per_frame*outframe+a))=
*(sample_subbyte+audiobuffer_in+(sample_from+labs(sample_from-sample_to)*a/samples_per_frame)*bytes_per_sample);
}
} else {
/* No input audio supplied, so we generate a sawtooth wave */
*(audiobuffer_out+bytes_per_sample*(samples_per_frame*outframe+a))=
((((sample_from+labs(sample_from-sample_to)*a/samples_per_frame))%48)-24)/4;
}
}
}
} else {
/* At least samples_per_frame input samples, so we can just skip some of them, or average several samples to produce the output ones. */
for (a=sample_from; ((sample_to>sample_from && a<=sample_to) || (sample_to<sample_from && a>=sample_to)) && (sample_from!=sample_to); ) {
sample_out=(samples_per_frame*outframe+((samples_per_frame-1)*(a-sample_from)/(sample_to-sample_from)));
/* printf ("Input sample %d goes to output sample %ld\n",a,sample_out); */
if (audiobuffer_in!=NULL) {
for (sample_subbyte=0; sample_subbyte<bytes_per_sample; sample_subbyte++) {
*(audiobuffer_out+bytes_per_sample*sample_out+sample_subbyte)=*(audiobuffer_in+bytes_per_sample*a+sample_subbyte);
}
} else {
/* No input audio supplied, so we generate a sawtooth wave */
*(audiobuffer_out+bytes_per_sample*sample_out)=(((bytes_per_sample*a)%48)-24)/4;
}
if (sample_to<sample_from) { a--; } else { a++; }
}
}
First, we check to see if flag_stutter_on_pause is set. If it is, then we check whether the tape velocity has remained at zero for at least two frames, by seeing if it's zero now, after being adjusted by the call to new_tape_velocity, and whether it was zero during the last video frame by checking whether the tape position at the first and last lines was the same. If the tape is indeed paused and not moving, then we artificially change the value that we have for the tape's ending position to what it would be one frame ahead. This is fine, because we've already processed the video for this frame, and this value will be overwritten on the next iteration of the loop.
Next, we calculate sample_from and sample_too. Note that these variable are not of type int, but were declared as long, so the expression should be safe from overflowing, despite the large multiplication of tape position and sample rate.
The first scenario we check for is that of audio being reversed, and at a slower than normal speed. If it is, then assuming that the input audio buffer contains valid data at all, we traverse the output samples and copy an appropriate input sample to the output buffer at each position. If the input audio buffer doesn't contain valid data because we couldn't read the original input file, then we synthesize a sawtooth wave. The wave is generated as if it was recorded on the tape at 1 Khz, and the pitch varies with the simulated tape speed.
Next, we check for the tape running forwards at slower than normal speed. The code to deal with this case is identical, apart from the sign used in the calculation of the position of the source sample to copy from:
*(sample_subbyte+audiobuffer_in+(sample_from - labs(sample_from-sample_to)*a/samples_per_frame)*bytes_per_sample);
*(sample_subbyte+audiobuffer_in+(sample_from + labs(sample_from-sample_to)*a/samples_per_frame)*bytes_per_sample);
Code duplication is usually a bad thing, so I've combined the remaining two cases of forwards and backwards tape movement at greater than normal speed into a single outer loop. This reduces the amount of code by about half, at the expense of some readability.
All we do is to copy each input sample to it's corresponding position in the output. If two or more input samples fall at the same output position, we just overwrite the earlier ones with later ones. This actually seems to work surprisingly well.
Summary and conclusions
In this project, we've created a realistic video shuttle effect with matching audio. We've seen how to formulate an abstract problem into C code, and implement it from first principles. We've seen how to generate a framebuffer with color bars, resample audio, and downsample 16-bit ppm files to 8-bit, and we've also noted some good programming practices along the way.
The effect works well enough visually to be usable in real video presentations. Performance of the code is acceptable, but could certainly be improved.
Check back for more programming projects in the near future!
It's all my own work!