Detecting Vertical Retrace in Microsoft Windows

Windows is not designed for real-time programming, and the vertical retrace just happens to be a real-time event. This paper talks about kernel drivers, assembly language and multitasking issues. I summarized it in an article for Windows Developer's Journal, August 2000 (volume 11, issue 8). You can obtain the source code published in the article from the code archives of Windows Developer's Journal.

Since publication, I found one bug in one of the drivers (the VxD). This bug is fixed in the vretrace.zip archive (37 kBytes) downloadable from this site. The ZIP file contains the source code and precompiled binaries, plus header files and a test program. A note on the source code: if you want to recompile the source, expect to need to modify the makefiles. A note on the test program: the kernel driver must be installed and started under Windows NT/2000/XP (and later) before a program can use it, and under Windows 95/98 the VxD must either be in the same directory as the .EXE or in a Windows "system" directory.

Like ActiveX/OLE Controls, Windows NT/2000 kernel drivers must be set up in the registry before they work. Usually installation programs or applications do this by calling into the "Services Control Manager" (SCM). To quickly set up a driver, this is not very convenient. To make installing and removing drivers a little easier, I made the "REGDRV" utility (19 kBytes) which is modelled after the REGSVR32 program. The ZIP archive includes a readme with usage notes.

If you have found this page, you have probably read other articles on Usenet or the Web on vertical retrace (under Windows) as well. This paper is partially written in response to these other articles. The article in Windows Developer's Journal is more focused and it is probably better written. Still, this HTML paper stays useful: it focuses at an analysis of the problem, rather than jumping right off to a solution. Also, it does not suffer from "size constraints"; a typical magazine article may not exceed 2000 words.

Downloads
Why detection of the vertical retrace is (still) needed
Why not simply use DirectX?
Detecting the vertical retrace, the basic operation
What to do if you drop in the middle of a vertical retrace?
What if some (future) video card does not support a vertical retrace bit at the particular I/O port address?
Multi-tasking issues
Determining the vertical refresh frequency
Reducing the overhead of DeviceIoControl()

Downloads

The vretrace.zip archive (37 kBytes)
The source code and precompiled binaries drivers to detect vertical retrace, plus header files and a test program.
The "REGDRV" utility (19 kBytes)
A utility to load and start kernel drivers on Windows NT/2000 (not needed for Windows 95/98/ME).

Why detection of the vertical retrace is (still) needed

Animation is subtle. The common saying is that if we perceive a sequence of pictures as "motion" if this sequence is flipped through quickly enough. How quick, exactly, depends on a few factors: it can be as low as 10 fps (frames per second) to as high as 40 fps. Two things that I have found to be detrimental to the fluency of animation (and thereby the illusion of motion) are "tearing" and irregularities in the timing interval (or irregularities in the step size of moving objects).

Tearing occurs when an object's position changes and electron guns of the CRT update the new and/or old areas of the object during the blit operation. If that happens, the computer display will show a situation where the top part of the image is updated and the bottom part that lags a frame behind (assuming that the blit goes top-down). This situation will only exist for a fraction of a second: if the vertical refresh rate of the monitor is 60 Hz, the display is updated 17 ms later. However, it exists long enough for us to perceive an irregularity in the object's shape, possibly due to phosphor afterglow. To avoid tearing, you should do blit as much as possible during the vertical retrace period. Therefore, it is necessary to be able to detect the start of the vertical retrace.

Waiting for a vertical retrace before blitting can cause irregularities in the timing interval. For example, assume an animation that runs at 25 fps (on the average) and the vertical refresh rate of the monitor is 60 Hz. A new frame should appear every 40 ms, but since you are synchronizing the blit on a vertical retrace, the distance (in time) between two frames is always a multiple of the period between two vertical retraces. That is, the frames are either 50 ms or 33 ms apart. Since the average frame rate must still be 25 fps (40 ms per frame), the interval between the frames (in milliseconds is going to look like:

50 33 50 33 33 50 33 50 33 33 50 33 50 33 33 ...

A nice solution would be to synchronize the frame rate at which the animation runs to the vertical retrace frequency. In the above example, if you can adjust the animation to run at 20 or 30 fps, instead of 25 fps, the frame rate neatly coincides with the vertical retrace frequency. This is not always an option, and if it is, it still requires you to programmatically find out what the vertical retrace frequency is on a particular monitor.

Why not simply use DirectX?

If you are asking yourself this question, you probably can use the WaitForVerticalBlank() method from DirectDraw. However, for a few projects that I am doing (screen-savers and similar small stuff), you cannot rely on the availability of DirectDraw on the customer's machine, but you still want the best animation fluency. Note that most of my work runs in windowed mode and much of the merits of DirectDraw go away when you run in windowed mode: flipping is not supported and blits are not synchronized to a vertical retrace by default. In addition, DirectDraw takes a few seconds while starting up, which can be an irritating handicap in small utilities (or screen savers), which should run in a snap.

DirectDraw offers the methods GetMonitorFrequency() and WaitForVerticalBlank(). A problem with GetMonitorFrequency() is that the DirectDraw layer forwards the question to the DirectDraw display driver and the drivers that I tested (on the machines that I have access to in my office) do not support this call. This makes GetMonitorFrequency() of limited use.

On the other hand, you can create your own implementation of GetMonitorFrequency(), based on the WaitForVerticalBlank() method (see below).

Detecting the vertical retrace, the basic operation

The following assembler snippet waits until bit 3 in the "Input Status Register 1" of the VGA (compatible) adapter signals that the vertical retrace interval is active.

Detecting vertical retrace

        mov     dx, 3dah        ; VGA input status register
    vretrace_loop:
        in      al, dx
        test    al, 8           ; bit 3 set?
        jz      vretrace_loop   ; no, continue waiting

The trouble is, the I/O port read operation may be trapped and you will not get reliable results unless you run at ring 0. To run in ring 0, you normally write a VxD (Windows 95/98) or a kernel driver (Windows NT, Windows 2000 and later). There is also a way to run in ring 0 without creating a low level driver; more on this later.

Several messages and articles that I have found online claim that you only need a driver for Windows NT; Windows 95/98 and Windows 3.x let the I/O read operation pass.
   When I first tested this on my development machine with Windows 98 installed, this appeared incorrect: bit 3 (which should indicate a vertical retrace) appeared to be set at a fixed interval that is unrelated to the vertical retrace. When checking another computer with Windows 95 installed, the port appeared not to be trapped. A third machine (running Windows 98) also indicated no port trapping.
   Faced with these inconclusive results, I started dumping the "Task State Segment" (TSS) on the three machines, and I saw confirmation for what I expected: on Windows 95/98, the Input Status Register port may, or may not, be trapped by the display driver(s). I have not enough machines running Windows 3.x to make any conclusive remarks, but I guess the same principle applies to this variety of Microsoft Windows as well.
   In conclusion: you really need to run at ring 0 to get reliable results, or you need a way to control the trapping of input status register yourself (more on this later).

So basically, you write a VxD and/or a kernel driver, load it and call the above polling loop via DeviceIoControl().

An alternative port to poll for the vertical retrace is the "vertical interrupt pending" bit in the "Input Status Register 0" (port 0x3C2). The disadvantages are that 1) quite a few video cards do not support this bit and 2) a device driver that uses this bit also clears it, so there is a chance of missing the retrace signal.

What to do if you drop in the middle of a vertical retrace?

When you enter the loop above and the vertical retrace happens to be active right away, you do not know whether you are at the beginning of the vertical retrace or near its end. So should you return immediately in order not to waste any more of the preciously short vertical retrace period, or should you wait for the start of the next vertical retrace?

It depends. If you suspect that the amount of screen update that you have to do will not fit it the vertical retrace period anyway, it is probably better to exit the loop immediately. This tends to avoid the timing irregularities mentioned above. If, on the contrary, you think that you can finish the screen update in a vertical retrace operation, waiting for the next retrace guarantees that no tearing will occur.

So in a driver, I suggest that you implement both.

Perhaps it is naive to think that Microsoft Windows can do anything quickly enough for a vertical retrace, but the hardware (CPU, memory bus, etc.) is getting scarily fast these days.

What if some (future) video card does not support a vertical retrace bit at the particular I/O port address?

Although I have not seen any video cards that do not support the vertical retrace bit or the Input Status Register yet, by all odds, they may exist. In a multiple monitor setup (called "multihead" or "dualhead"), however, it is likely that the Input Status Register of one of the adapters/channels has been moved to a different I/O address.

To protect the tiny loop that I gave above from entering an endless loop, you must build some kind of time-out into the loop. The method that you choose to get a time stamp (or a time-out event) should have the lowest possible overhead, because you would otherwise risk missing the start of the vertical retrace period (if the function call takes half of a millisecond, you risk missing the vertical retrace altogether).

In a VxD, you can get the address of a memory location that keeps the number of timer ticks since the start up of the machine by calling Get_System_Time_Address. The DWORD memory location gives the time since start up in milliseconds, but it is incremented in steps of 55 ms (an interval that is all too familiar if you are a veteran programmer). Under Windows NT, I have not found a similar routine to query the address of an internal memory location that keeps the current time, but NT does provide a convenient set of timer functions at the kernel level: with KeInitializeTimer() and KeSetTimer() you can create a timer that enters a "signalled" state after its period expires. Then, a call to KeReadStateTimer() in the loop for vertical retrace detection is an efficient way to check for a time-out.

Getting the relocated address for the Input Status Register on a multihead setup appears to require hardware-specific code: no BIOS or VESA has been agreed upon. To check on which monitor you are running, you can, of course, use the Win32 functions MonitorFromWindow(), MonitorFromRect() and MonitorFromPoint() (these are not supported on Windows 95).

Multi-tasking issues

Time-sliced multi-threading presents another problem: when you are waiting for the start of a vertical retrace, you do not want a different thread or task to have the processor's attention while a vertical retrace is happening.

The threads within a system are organized into prioritized queues. The thread scheduler allows a thread a predetermined amount of execution time (a "quantum"). After a thread expires its quantum, the scheduler selects a new "runable" thread from the queue with threads at the highest priority (this could be the same thread, if that thread is at the highest priority level and it is the only thread at that level). Given that the number of threads ready for execution changes frequently and that the threads required by the operating system are assigned high priorities, the period between two time slices in the thread that waits for the vertical retrace can vary dramatically.

I could not find any documentation from Microsoft on duration of a quantum for Windows 95/98 and Windows NT. Mark Russinovich from www.sysinternals.com claims that:

"On NT Server, the quanta are fixed for both foreground and background processes at 120 ms, and on NT Workstation a background process has a quantum of 20 ms and a foreground process has a quantum of 20, 40 or 60 ms (the foreground boost slider in the Performance tab of the System applet in the Control Panel determines which).

Windows 95 has a quantum of 20 ms (according to the book "Windows 95 System Programming Secrets"). Windows 2000 allows the quantum to be adjusted via the control panel. Windows CE is documented (by Microsoft) to use a fixed quantum of 25 ms.

The length of the vertical retrace period depends on the refresh rate and the resolution of the display; a ballpark measure is around 50 µs (microseconds) —the original VGA design (640×480, 60Hz) had a vertical retrace period of 64 µs, but on higher resolutions or higher refresh frequencies, the vertical retrace period is shorter. In short: miss a time slice due to concurrently running tasks, and you will probably miss the vertical retrace.

One way to fix this is to increase the priority of the thread that waits for the vertical retrace, so that your thread receives time slices more frequently (and, it is hoped, more regularly). To temporarily boost the thread's priority, use code similar to the snippet below:

Boosting a thread's priority

  HANDLE hProcess = GetCurrentProcess();
  HANDLE hThread = GetCurrentThread();
  /* save the current priority */
  DWORD ClassPriority = GetPriorityClass(hProcess);
  int ThreadPriority = GetThreadPriority(hThread);

  /* boost the priority to the highest level */
  SetPriorityClass(hProcess, REALTIME_PRIORITY_CLASS);
  SetThreadPriority(hThread,  THREAD_PRIORITY_TIME_CRITICAL);

  /* wait for vertical retrace */
  ...

  /* reset the priority */
  SetPriorityClass(hProcess, ClassPriority);
  SetThreadPriority(hThread,  ThreadPriority);

This is not a fail-safe solution: you may still loose a vertical time-slice and miss the vertical retrace. In my experience, it worked well however. Max Fomitchev wrote in his article for Dr. Dobb's Journal that Windows 95/98, regardless of the priority boost, does not suspend system processes, whereas Windows NT does. To improve the change that you see a vertical retrace in Windows 95/98, protract the time-slices of the thread that waits for the retrace (using Adjust_Execution_Time from inside the VxD). Mark Russinovich found a trick to protract a timeslice indefinitely under Windows NT 4.0 without changing the thread priority (this "feature" is probably a bug in Windows NT). If you really cannot afford to miss the vertical retrace, there is only one solution left: disable the interrupts while you are waiting for the retrace (remember that you are running in ring 0). While this works, blocking interrupts for longer than a millisecond may cause other components, such as a modem, to malfunction.

By the way, as you see in the snippet above, in order to get a high priority thread all the other threads in the process are boosted too (albeit not as much). This is unnecessary and unfortunate, but Microsoft Windows is designed that way. There is a way to get a high priority thread in a normal priority application under Windows NT, see the article by Jeff Claar.

Determining the vertical refresh frequency

Once you have a routine that waits for the (next) vertical retrace, determining how many vertical retraces fit in a specific period is fairly easy: count the number of time-slices until that period expires. The only issue I would like to point at is that after you have seen a vertical retrace, the next one will occur at least five milliseconds later. Hence, it may be a good idea to pause your thread for 5 ms, especially if you thread is running at a high priority. To do this, call Sleep(5) directly after seeing a vertical retrace.

If you count the number of vertical retraces that occur in one second, you should get the vertical retrace frequency.

Windows NT can give you the vertical refresh rate of a monitor with the following code snippet:

Determining the refresh frequency under Windows NT

  int freq;
  hdc = GetDC(NULL);
  freq = GetDeviceCaps(hdc, VREFRESH);
  ReleaseDC(NULL, hdc);

Probably the GetDeviceCaps() reads a value from the device driver, which also means that some drivers may not support it. In my tests, GetDeviceCaps() always returned zero on Windows 95/98.

Reducing the overhead of DeviceIoControl

I expected the IN instruction to take only a few cycles and (which is not exactly true, I'll come to this later). So, if there is a large number of instructions needed to get from the vertical retrace detection loop back to the code that needs to do the work during that vertical retrace, the interval available for the work is shortened unnecessarily.

As an experiment, I stepped through the route back of a DeviceIoControl() call in Windows 98: the path from the exit point of the VxD to the program that called DeviceIoControl(). I expected some 40 to 50 instructions to be executed in between. If this were the case, it would be a nice excuse to spelunk how to avoid DeviceIoControl(). I stopped counting when I had counted well over 150 instructions, including many memory accesses. (Windows NT turned out to have less overhead than Windows 95/98 when I ran the same procedure under that operating system.)

One way to avoid the overhead is to jump to ring 0 and implement the loop in a callback function. There are techniques to execute at the ring 0 level without calling into a VxD or a kernel driver. Port I/O is not trapped in ring 0. On the other hand, if you could force Windows to not trap the "Input Status Register" of the VGA adapter in ring 3, you are set. For Windows 95/98, marking a port as "not trapped" is be done by calling Disable_Local_Trapping (in a VxD). There is a danger in this technique: a driver that asked to trap a port might have a reason to do so. To achieve the same in Windows NT, you have to resort to undocumented techniques, see (for example) Graham Wideman's article on port I/O under Windows NT. The beauty is, after having enabled the port, you can unload the driver. Windows will not re-enable the port.

In addition to the caveat mentioned in the previous paragraph, direct port I/O (from ring 3) is not that advantageous to plunging through DeviceIoControl(): Rick Booth reports that he measured an IN instruction at 66 cycles (on a Pentium 100 MHz) instead of the documented 7 cycles (in ring 0), due to wait states inserted by the bus. My own measurements give 163 cycles per IN instruction (on a Pentium II 300 MHz). Add to that the approximately 20 cycles of overhead when issuing the IN instruction from ring 3, and the overhead of a DeviceIoControl() call becomes "reasonable".

In closing

Many of the topics I have touched upon here could be elaborated on. In time, I hope to research this topic further and update this paper with my findings.

One of the issues that I have ignored so far is the use of the "vertical nondisplay" period instead of the vertical retrace period. Before the vertical retrace period starts, the "display enable" bit in the Input Status Register is turned off and it stays off until after the vertical retrace ends. The vertical nondisplay period thus starts earlier than the vertical retrace period. Detecting vertical nondisplay is not as simple as the vertical retrace, because the "display enable" bit in the Input Status Register is also off during a horizontal retrace. The only criterion that you have to distinguish the vertical nondisplay from a horizontal retrace is that the vertical nondisplay lasts quite a bit longer. That is, you have to time the duration that the display enable bit stays off and if that period exceeds some limit, you can conclude that vertical nondisplay has started. Fortunately, the internal time stamp registers of the Pentium processors give us very precise time stamps.

The paper more or less covers the topics that I encountered when trying to implement the simple loop for vertical retrace detection. As you can see, the actual detection loop is quite small; it is mostly the surrounding multi-tasking environment that requires our attention.

References

Booth, Rick; "Inner Loops"; Addison-Wesley; 1997; ISBN 0-201-47960-5.: A fine book on optimization by someone who went through the trouble of actually measuring the instruction timings one-by-one.
Claar, Jeff; "Beating NT's Priority Scheme"; Windows Developer's Journal; volume 10, issue 2 (February 1999).: By default, an application cannot have a thread in real-time priority if the complete application is not of the "real-time" class. This is inconvenient, because it will not let you create a "normal" program (that does not interfere with the performance of the operating system itself) with a single "monitor" thread. This article supplies a solution, using an undocumented function.
Finnegan, James; "Pop Open a Privileged Set of APIs with Windows NT Kernel Mode Drivers"; Microsoft Systems Journal; March 1998.: This article is (still) included on MSDN and is available online at: http://www.microsoft.com/msj/0398/driver.htm. The article discusses how to build and use a kernel driver, and how to install it programmatically.
Fomitchev, Max I; "MMX Technology Code Optimization"; Dr. Dobb's Journal; volume 24, issue 9 (September 1999).: Although specifically covering MMX, the article has good advice on optimization in general and contains hard-won titbits about the peculiarities of the various Win32 APIs.
Pietrek, Matt; "Windows 95 System Programming Secrets"; IDG Books; 1995; ISBN 1-56884-318-6; pp. 43-44.: A remarkable book: the author gained much of the information from disassembling Microsoft Windows 95. The reference to the quantum is on pages 43.
Riemersma, Thiadmer; "Detecting Vertical Retrace"; Windows Developer's Journal; volume 11, issue 8 (August 2000).
Russinovich, Mark; "Win2K Quantums": This paper also covers the time slice quanta that Microsoft Windows NT uses.
Russinovich, Mark; "Systems Internals Tips and Trivia".: One of the tips is a simple trick to create a never ending quantum under Microsoft Windows NT. In other words, you can simply shut multi-tasking off completely. The essential part of this trick is that Windows NT (unlike Windows 95/98) resets the quantum of a thread when it calls SetThreadPriority(), even if this call does not change the thread's priority.
Wideman, Graham; "Hardware I/O Port Programming with Delphi and NT".: This paper presents an update to Dale Roberts' "GIVIO.SYS" driver (Dr. Dobb's Journal, May 1996), which allows you to enable a specific port for direct port I/O (rather than to enable all ports at once.

Detecting Vertical Retrace in Microsoft Windows

Detecting Vertical Retrace in Microsoft Windows

Contents

Downloads

Why detection of the vertical retrace is (still) needed

Why not simply use DirectX?

Detecting the vertical retrace, the basic operation

What to do if you drop in the middle of a vertical retrace?

What if some (future) video card does not support a vertical retrace bit at the particular I/O port address?

Multi-tasking issues

Determining the vertical refresh frequency

Reducing the overhead of DeviceIoControl

In closing

References