Basic Unicode readiness testing for your application
Posted
Unicode is a very complex standard, always evolving, but this doesn't mean you shouldn't do some basic testing, in order to uncover hidden bugs.
Here is a small Unicode string that could be used to test the readiness of your application to deal with Unicode strings. You can use this string to:
- filename</strong> - save and load files using this string as port of the filename. Also you should try a long path of more than 260 characters in order to find problems regarding usage of older API under Windows.</li> text input</strong> (paste it) and see if the application will display it wrongly.</div>
- input cursor</strong> – first check how cursor moves under Notepad on Windows 7 and see if your application behaves the same. If you'll see strange character movements or decompositions, you are doing something wrong.</li>
- selection </strong>– as above, check the notepad fist and after this check if your application does select text the same way</li>
</ul>
</li>rendering</strong>, if your application is rendering text somewhere, it's a good idea to use it to see if it does render well</div>- text size</strong>, are the CJK characters too small to be recognized?</li>
- bad rendering</strong>, an empty rectangle may indicate a missing glyph (required font missing), this is not very dangerous – nobody has all the fonts but if you see question marks or other strange things you may have a real problem. It's best if your application does support font-fallback, when it does display text in order to prevent the missing glyph sign.</li>
</ul>
</li>
</ul>
I will post a text file encoded as UTF-8</a> (with BOM) that contains the test string because Wordpress will cut the article where it does find the character outside Unicode BMP.<img src="http://wp.sbarnea.com/wp-content/uploads/2010/04/042910_1508_BasicUnicod1.png" alt="" />
Let me know if this helped you and if you know additional tests that I could include in this basic test.
- bad rendering</strong>, an empty rectangle may indicate a missing glyph (required font missing), this is not very dangerous – nobody has all the fonts but if you see question marks or other strange things you may have a real problem. It's best if your application does support font-fallback, when it does display text in order to prevent the missing glyph sign.</li>
- selection </strong>– as above, check the notepad fist and after this check if your application does select text the same way</li>