A tiny DrawHTML() function

Skip to main content (skip navigation menu)
Letterhead logo






A tiny DrawHTML() function

 

The DrawHTML() function is nearly a drop-in replacement of the standard DrawText() function, with limited support for HTML formatting tags. It is only nearly a replacement for DrawText(), because a few formatting flags of the real DrawText() function are not supported. More limiting, perhaps, is that only a minimal subset of HTML tags is supported. A link to the downloadable files is at the bottom of this page.

DrawHTML() is inspired by Pocket HTML by Petter Hesselberg; published in Windows Developer Journal, February 2000. The implementation is fully mine, however, because I felt that the HTML parser in Pocket HTML leaves to be desired. My implementation is very limited, but already more complete than that of Pocket HTML; it is more easily extensible; and it scales better by allocating resources on an "as-needed" basis, rather than grabbing all possibly needed resources on start-up of the function.

Why DrawHTML() when there are full HTML parsers with comprehensive support for all tags? My reasons for implementing this are:

Using the code

The function prototype for DrawHTML() is the same as that of the standard Win32 SDK function DrawText(). In your program, you would use DrawHTML() just like you would use DrawText().

A typical use in an application might look like:

Using DrawHTML()
static void Cls_OnPaint(HWND hwnd)
{
  PAINTSTRUCT PaintStruct;
  BeginPaint(hwnd, &PaintStruct);
  HFONT hfontOrg = (HFONT)SelectObject(PaintStruct.hdc, hfontBase);

  RECT rc;
  GetClientRect(hwnd, &rc);
  SetRect(&rc, rc.left + Margin, rc.top + Margin,
               rc.right - Margin, rc.bottom - Margin);

  DrawHTML(PaintStruct.hdc,
           "<p>Beauty, success, truth ..."
           "<br><em>He is blessed who has two.</em>"
           "<br><font color='#C00000'><b>Your program"
           " has none.</b></font>"
           "<p><em>Ken Carpenter</em>",
           -1,
           &rc,
           DT_WORDBREAK);

  SelectObject(PaintStruct.hdc, hfontOrg);
  EndPaint(hwnd, &PaintStruct);
}

There is a bit of scaffolding code around the call to DrawHTML(), to offset the text from the frame of the window and to select a bigger font. The font, hfontBase, is created elsewhere (not shown).

As I wrote already, the HTML support by DrawHTML() is very limited:

DrawHTML() is Unicode-compatible, but in a way different than a web-browser does it: instead of using an 8-bit encoding for the Unicode data (UTF-8), you just pass in a "wide character" string. To have Unicode support, you should compile the DrawHTML() source code with the UNICODE and _UNICODE macros defined.

The code for DrawHTML() consists of three blocks:

  1. There is a simple parser, consisting of the functions GetToken(), ParseColor() and HexDigit().
  2. The text drawing function consisting of GetFontVariant() and DrawHTML().
  3. A simple small colour stack for the colours set with the <font> tag.

The parser

The parser is fairly strict, and it has a fall-back in that everything that it does not recognize is "plain text". This includes unknown tags, and there, DrawHTML() differs from browsers, which ignore unknown HTML tags.

DrawHTML(): the parser
#define ENDFLAG   0x100
enum { tNONE, tB, tBR, tFONT, tI, tP, tU, tNUMTAGS };
struct {
  char *mnemonic;
  short token,param,block;
} Tags[] = {
  { NULL,         tNONE, 0, 0},
  { _T("b"),      tB,    0, 0},
  { _T("br"),     tBR,   0, 1},
  { _T("em"),     tI,    0, 0},
  { _T("font"),   tFONT, 1, 0},
  { _T("i"),      tI,    0, 0},
  { _T("p"),      tP,    0, 1},
  { _T("strong"), tB,    0, 0},
  { _T("u"),      tU,    0, 0},
};

static int GetToken(LPCTSTR *String, int *Size, int *TokenLength, BOOL *WhiteSpace)
{
  LPCTSTR Start, EndToken;
  int Length, EntryWhiteSpace, Index, IsEndTag;

  assert(String != NULL && *String != NULL);
  assert(Size != NULL);
  Start = *String;

  /* check for leading white space, then skip it */
  if (WhiteSpace != NULL) {
    EntryWhiteSpace = *WhiteSpace;
    *WhiteSpace = EntryWhiteSpace || _istspace(*Start);
  } else {
    EntryWhiteSpace = FALSE;
  } /* if */
  while (*Size > 0 && _istspace(*Start)) {
    Start++;
    *Size -= 1;
  } /* if */
  if (*Size <= 0)
    return -1;  /* no printable text left */

  EndToken = Start;
  Length = 0;
  IsEndTag = 0;
  if (*EndToken == _T('<')) {
    /* might be a HTML tag, check */
    EndToken++;
    Length++;
    if (Length < *Size && *EndToken == _T('/')) {
      IsEndTag = ENDFLAG;
      EndToken++;
      Length++;
    } /* if */
    while (Length < *Size && !_istspace(*EndToken)
           && *EndToken != _T('<') && *EndToken != _T('>'))
    {
      EndToken++;
      Length++;
    } /* while */
    for (Index = sizeof Tags / sizeof Tags[0] - 1; Index > 0; Index--)
      if (!_tcsnicmp(Start + (IsEndTag ? 2 : 1), Tags[Index].mnemonic,
                     _tcslen(Tags[Index].mnemonic)))
        break;
    if (Index > 0) {
      /* so it is a tag, see whether to accept parameters */
      if (Tags[Index].param && !IsEndTag) {
        while (Length < *Size
               && *EndToken != _T('<') && *EndToken != _T('>'))
        {
          EndToken++;
          Length++;
        } /* while */
      } else if (*EndToken != _T('>')) {
        /* no parameters, then '>' must follow the tag */
        Index = 0;
      } /* if */
      if (WhiteSpace != NULL && Tags[Index].block)
        *WhiteSpace = FALSE;
    } /* if */
    if (*EndToken == _T('>')) {
      EndToken++;
      Length++;
    } /* if */
    /* skip trailing white space in some circumstances */
    if (Index > 0 && (Tags[Index].block || EntryWhiteSpace)) {
      while (Length < *Size && _istspace(*EndToken)) {
        EndToken++;
        Length++;
      } /* while */
    } /* if */

  } else {
    /* normal word (no tag) */
    Index = 0;
    while (Length < *Size && !_istspace(*EndToken) && *EndToken != _T('<')) {
      EndToken++;
      Length++;
    } /* while */
  } /* if */

  if (TokenLength != NULL)
    *TokenLength = Length;
  *Size -= Length;
  *String = Start;
  return Tags[Index].token | IsEndTag;
}

static int HexDigit(TCHAR ch)
{
  if (ch >= _T('0') && ch <= _T('9'))
    return ch - _T('0');
  if (ch >= _T('A') && ch <= _T('F'))
    return ch - _T('A') + 10;
  if (ch >= _T('a') && ch <= _T('f'))
    return ch - _T('a') + 10;
  return 0;
}

static COLORREF ParseColor(LPCTSTR String)
{
  int Red, Green, Blue;

  if (*String == _T('\'') || *String == _T('"'))
    String++;
  if (*String == _T('#'))
    String++;
  Red   = (HexDigit(String[0]) << 4) | HexDigit(String[1]);
  Green = (HexDigit(String[2]) << 4) | HexDigit(String[3]);
  Blue  = (HexDigit(String[4]) << 4) | HexDigit(String[5]);
  return RGB(Red, Green, Blue);
}

The handling of white space in HTML has never been very clear to me, but we can make some common-sense rules that work fairly well. In general, multiple spaces (or other white space characters) must be replaced by a single space. A few tags, like <p>, eat up all space. In the parser, these tags are called block tags, see the definition of the structure "Tags". The HTML DTD also makes a distinction between block and inline tags, but not exactly in the same way that I have done (one difference is that the HTML DTD standardizes the syntax, whereas I have to add semantics to it).

If a tag does not allow parameters, I do not allow white space in the tag. This is a choice, so that I could more easily identify whether a "<" character starts a tag or whether it should just be printed, like all other "plain" text.

When you wish to display a supported HTML tag, rather than have it interpreted, you have to use a trick. DrawHTML() will fully ignore an empty "end" tag with the syntax "</>". If you use it in a syntax like: "<</>p>", DrawHTML() will interpret the first "<" literally (i.e., as plain text) because it is followed by another < and therefore cannot be the start of a valid HTML tag; the next "</>" is ignored and the remaining "p>" are again interpreted literally. In effect, we have broken a HTML tag into two pieces, which causes it to be displayed as "<p>".

The formatting code

DrawHTML(): main function plus font cache
#define FV_BOLD       0x01
#define FV_ITALIC     (FV_BOLD << 1)
#define FV_UNDERLINE  (FV_ITALIC << 1)
#define FV_NUMBER     (FV_UNDERLINE << 1)

static HFONT GetFontVariant(HDC hdc, HFONT hfontSource, int Styles)
{
  LOGFONT logFont = { 0 };

  SelectObject(hdc, (HFONT)GetStockObject(SYSTEM_FONT));
  if (!GetObject(hfontSource, sizeof logFont, &logFont))
    return NULL;

  /* set parameters, create new font */
  logFont.lfWeight = (Styles & FV_BOLD) ? FW_BOLD : FW_NORMAL;
  logFont.lfItalic = (BYTE)(Styles & FV_ITALIC) != 0;
  logFont.lfUnderline = (BYTE)(Styles & FV_UNDERLINE) != 0;
  return CreateFontIndirect(&logFont);
}

int __stdcall DrawHTML(
                       HDC     hdc,        // handle of device context
                       LPCTSTR lpString,   // address of string to draw
                       int     nCount,     // string length, in characters
                       LPRECT  lpRect,     // address of structure with formatting dimensions
                       UINT    uFormat     // text-drawing flags
                      )
{
  LPCTSTR Start;
  int Left, Top, MaxWidth, MinWidth, Height;
  int SavedDC;
  int Tag, TokenLength;
  HFONT hfontBase, hfontSpecial[FV_NUMBER];
  int Styles, CurStyles;
  SIZE size;
  int Index, LineHeight;
  POINT CurPos;
  int WidthOfSPace, XPos;
  BOOL WhiteSpace;
  RECT rc;

  if (hdc == NULL || lpString == NULL)
    return 0;
  if (nCount < 0)
    nCount = _tcslen(lpString);

  if (lpRect != NULL) {
    Left = lpRect->left;
    Top = lpRect->top;
    MaxWidth = lpRect->right - lpRect->left;
  } else {
    GetCurrentPositionEx(hdc, &CurPos);
    Left = CurPos.x;
    Top = CurPos.y;
    MaxWidth = GetDeviceCaps(hdc, HORZRES) - Left;
  } /* if */
  if (MaxWidth < 0)
    MaxWidth = 0;

  /* toggle flags we do not support */
  uFormat &= ~(DT_BOTTOM | DT_CENTER | DT_RIGHT | DT_TABSTOP | DT_VCENTER);
  uFormat |= (DT_LEFT | DT_NOPREFIX);

  /* get the "default" font from the DC */
  SavedDC = SaveDC(hdc);
  hfontBase = SelectObject(hdc, (HFONT)GetStockObject(SYSTEM_FONT));
  SelectObject(hdc, hfontBase);
  /* clear the other fonts, they are created "on demand" */
  for (Index = 0; Index < FV_NUMBER; Index++)
    hfontSpecial[Index] = NULL;
  hfontSpecial[0] = hfontBase;
  Styles = 0; /* assume the active font is normal weight, roman, non-underlined */

  /* get font height (use characters with ascender and descender);
   * we make the assumption here that changing the font style will
   * not change the font height
   */
  GetTextExtentPoint32(hdc, _T("Ây"), 2, &size);
  LineHeight = size.cy;

  /* run through the string, word for word */
  XPos = 0;
  MinWidth = 0;
  stacktop = 0;
  CurStyles = -1; /* force a select of the proper style */
  Height = 0;
  WhiteSpace = FALSE;

  Start = lpString;
  for (
    Tag = GetToken(&Start, &nCount, &TokenLength, &WhiteSpace);
    if (Tag < 0)
      break;
    switch (Tag & ~ENDFLAG) {
    case tP:
      if ((Tag & ENDFLAG) == 0 && (uFormat & DT_SINGLELINE) == 0) {
        if (Start != lpString)
          Height += 3 * LineHeight / 2;
        XPos = 0;
      } /* if */
      break;
    case tBR:
      if ((Tag & ENDFLAG) == 0 && (uFormat & DT_SINGLELINE) == 0) {
        Height += LineHeight;
        XPos = 0;
      } /* if */
      break;
    case tB:
      Styles = (Tag & ENDFLAG) ? Styles & ~FV_BOLD : Styles | FV_BOLD;
      break;
    case tI:
      Styles = (Tag & ENDFLAG) ? Styles & ~FV_ITALIC : Styles | FV_ITALIC;
      break;
    case tU:
      Styles = (Tag & ENDFLAG) ? Styles & ~FV_UNDERLINE : Styles | FV_UNDERLINE;
      break;
    case tFONT:
      if ((Tag & ENDFLAG) == 0) {
        if (_tcsnicmp(Start + 6, _T("color="), 6) == 0)
          PushColor(hdc, ParseColor(Start + 12));
      } else {
        PopColor(hdc);
      } /* if */
      break;
    default:
      if (Tag == (tNONE | ENDFLAG))
        break;
      if (CurStyles != Styles) {
        if (hfontSpecial[Styles] == NULL)
          hfontSpecial[Styles] = GetFontVariant(hdc, hfontBase, Styles);
        CurStyles = Styles;
        SelectObject(hdc, hfontSpecial[Styles]);
        /* get the width of a space character (for word spacing) */
        GetTextExtentPoint32(hdc, _T(" "), 1, &size);
        WidthOfSPace = size.cx;
      } /* if */
      /* check word length, check whether to wrap around */
      GetTextExtentPoint32(hdc, Start, TokenLength, &size);
      if (size.cx > MaxWidth)
        MaxWidth = size.cx;   /* must increase width: long non-breakable word */
      if (WhiteSpace)
        XPos += WidthOfSPace;
      if (XPos + size.cx > MaxWidth && WhiteSpace) {
        if ((uFormat & DT_WORDBREAK) != 0) {
          /* word wrap */
          Height += LineHeight;
          XPos = 0;
        } else {
          /* no word wrap, must increase the width */
          MaxWidth = XPos + size.cx;
        } /* if */
      } /* if */
      /* output text (unless DT_CALCRECT is set) */
      if ((uFormat & DT_CALCRECT) == 0) {
        SetRect(&rc, Left + XPos, Top + Height,
                     Left + MaxWidth, Top + Height + LineHeight);
        DrawText(hdc, Start, TokenLength, &rc, uFormat);
      } /* if */
      /* update current position */
      XPos += size.cx;
      if (XPos > MinWidth)
        MinWidth = XPos;
      WhiteSpace = FALSE;
    } /* if */

    Start += TokenLength;
  } /* for */

  RestoreDC(hdc, SavedDC);
  for (Index = 1; Index < FV_NUMBER; Index++) /* do not erase hfontSpecial[0] */
    if (hfontSpecial[Index] != NULL)
      DeleteObject(hfontSpecial[Index]);

  /* store width and height back into the lpRect structure */
  if ((uFormat & DT_CALCRECT) != 0 && lpRect!=NULL) {
    lpRect->right = lpRect->left + MinWidth;
    lpRect->bottom = lpRect->top + Height + LineHeight;
  } /* if */

  return Height;
}

The GetFontVariant() function deselects the current font from the DC. I did this, because the font could be the base font, and GetObject() may fail when called on an object that is currently selected in a DC.

Also apparent in the source code for the DrawHTML() function is that there are "formatting flags" of the DrawText() function that DrawHTML() does not support. These are related to alignment (horizontal and vertical) and to setting tab stops. Supporting horizontal and vertical alignment requires an extra pass over the text, to get the full height and the width of each individual line. Specifically, the following formatting flags of the DrawText() function are not supported:

FlagDescription
DT_CENTER centre text lines horizontally
DT_RIGHT align text lines to the right border
DT_TABSTOP expand TAB characters (to 8 spaces)

These three flags are ignored if they are set. The "&" character is never a "prefix character" in DrawHTML(), so the DT_NOPREFIX flag is not necessary.

More noteworthy, in fact, is that all the other flags are supported, specifically the flags DT_SINGLELINE, which causes the tags <p> and <br> to be ignored, and DT_CALCRECT, which calculates the bounding rectangle for the text without actually drawing it. Compatibility with DrawText() is furthermore improved by using DrawText() in the back end to actually draw the text after having parsed the HTML code.

The colour stack

The third section of code is that for a stack of colours. Its purpose is to return to the previous colour when a </font> for a colour is given. When changing a colour, therefore, the old colour must be saved somewhere. Hence, the stack.

DrawHTML(): the colour stack
#define STACKSIZE   8
static COLORREF stack[STACKSIZE];
static int stacktop;

static BOOL PushColor(HDC hdc, COLORREF clr)
{
  if (stacktop < STACKSIZE)
    stack[stacktop++] = GetTextColor(hdc);
  SetTextColor(hdc, clr);
  return TRUE;
}

static BOOL PopColor(HDC hdc)
{
  COLORREF clr;
  BOOL okay = (stacktop > 0);

  if (okay)
    clr = stack[--stacktop];
  else
    clr = stack[0];
  SetTextColor(hdc, clr);
  return okay;
}

My tiny HTML implementation is exactly that: tiny. Petter Hesselberg's "Pocket HTML" is tinier still. For adequate support of HTML, you may want to try the QHTML library/control by GipsySoft.

The function DrawHTML() is also an an example of how you can customize the text lay-out and formatting of the "Callout control" which was published in Dr. Dobb's Journal of August 2004. This control uses DrawText() to format the contents of the "comic balloon", and it allows you to change the text output function by sending the control a message (sub-classing is not required). The only requirement that the callout control has for any replacement output function is that it is with DrawText().

For the callout control, I initially wanted to make a function DrawRichText() that parses a RTF stream. I was hoping to use the RichEdit ITextServices interface to ease my job. Alas, the Microsoft SDK example that does something remotely similar to what I needed was daunting, and further documentation was sparse. Hence, I turned away from it and opted for (Pocket) HTML instead.

Errata for "Building Callout Controls"

After publication, I noticed that the time-out feature of the callout control does not work correctly. This feature should hide the control automatically when the time expires. There are two bugs:

  1. The timer is never started, so the control cannot detect the time-out; this should have been done in Cls_OnCreate(). Function Cls_OnDestroy() must destroy the timer with a call to KillTimer(), which was also missing.
  2. The "message cracker" function Callout_SetTimeout in the header file passes the time-out value in the wrong parameter (it must be lParam, not wParam).

In addition, in the functions Cls_OnLButtonUp() and Cls_OnTimer() the code first sends the notification message and then hides the callout window. This may cause a failure if the calling application deletes the window on reception of the notification message. It is better to avoid referring the to window handle after sending the notification message. In this context this means: hide the window before sending the notification message that tells the user that the control is made hidden.

Downloads