One of the most annoying issues that surface after you have delivered a site (and users start creating pages and entering content) are the appearance of characters that don’t display correctly. We’ve all encountered this at some stage browsing the interwebs – those annoying characters/symbols that rear their ugly heads from time to time:

Internet Explorer:
Internet Explorer character encoding issues

Firefox:
Firefox character encoding issues

An this is what we should see:
Firefox no character encoding issues

What is particularly frustrating is that these strange characters appear only after a page has been published to the live website – not within SmartEdit mode.

This occurs when the characters (letters, numbers, symbols) making up your page’s text are not uniformly encoded in the same character set. Some might be in Western European (iso-8859-1) and others might be saved in Unicode (utf-8).

To fix this, you need to specifically tell the browser what character set you are using to display the page content, and that all characters on your page are stored using the same character set. Alternatively, characters that cannot be expressed within a specific character set can be embedded into the page using character entity references (in the form or numeric or named values).

It’s particularly important that you specify the correct character set or include the appropriate character entities when publishing RSS feeds out of RedDot CMS, otherwise you will encounter XML parsing issues.

Selecting the correct published character set will fix the majority of your issues.

For each language variant, select the appropriate published character set from the drop down list under ‘Edit Language Variant’.

RedDot CMS language variant settings

As a rule of thumb, I would suggest selecting UTF-8 as the character encoding set as this encoding can support many languages and can accommodate pages displaying content using a mixture of those languages.
Also, ensure that the appropriate declaration is added within your page. For XML (including XHTML), use the encoding pseudo-attribute in the XML declaration at the start of a document or the text declaration at the start of an entity.

<?xml version="1.0" encoding="utf-8" ?>

For HTML or XHTML served as HTML, you should always use the <meta> tag inside <head>. Example:

<meta http-equiv="Content-Type" content="text/html;charset=utf-8" >

For XHTML, you need a slash at the end:

<meta http-equiv="Content-Type" content="text/html;charset=utf-8" />

NOTE: When using UTF-8, make sure that the BOM (Byte Order Mark) under the publishing target is not included. Check out this article for more information about BOM.
RedDot CMS publishing target settings

Convert specific characters into the appropriate character entity

Most developers generally rely on the ISO 8859-1 – Western European character set for English based sites, however this set does not include Unicode characters – such as m & ndashes, left and right double quotes etc – that inevitably finding their way into HTML Text placeholders when cut and pasting content from Word or PDF documents. I’ve found this to be the main cause for most of those annoying character issues.

If you need to include these specific kinds of characters within your page when using a character set other than UTF-8, the HTML Convert table within the CMS will convert them to the appropriate entity so they can be displayed correctly.

I’ve attached a HTML Convert table that I use frequently that includes most commonly used characters that need to be encoded. (NOTE: I’ve found that this file needs to be saved in ANSI format in order to work within Red Dot, however some other users find that Unicode works fine for them. Just make sure you test out any changes thoroughly!!!)

Copy this file within the ‘ASP’ folder where the CMS is installed (typically C:\Program Files\Open Text\WS\MS\ASP). Within the Project Variant settings, make sure you specify the file you wish to use:
RedDot CMS project variant settings

For more information about the HTML Convert Table, check out Stefen Buchali’s post.

Your character encoding issues within your RedDot CMS projects should now be a thing of the past!

Share and Enjoy:
  • Print
  • Digg
  • Sphinn
  • del.icio.us
  • Facebook
  • Mixx
  • Google Bookmarks
  • Blogplay
  • blogmarks
  • email
  • Faves
  • LinkedIn
  • Linkter
  • MySpace
  • NewsVine
  • Ping.fm
  • Posterous
  • Reddit
  • Slashdot
  • Socialogs
  • StumbleUpon
  • Suggest to Techmeme via Twitter
  • Technorati
  • Tumblr
  • Twitter
  • Yahoo! Bookmarks
  • Yahoo! Buzz
  • Yigg