The following is not meant to be an exhaustive look at the topic of Unicode, or the ISO-10646 standard that Unicode now tries to match character for character. It is meant to give a historical context to help explain where Visual Basic and other Microsoft products are in relation to Unicode-and hopefully where they are heading, as well. This history will cover most of the high points, and might be more functional than chronological. Where to start? Well, at the beginning, of course…
16-Bit Windows (Windows 3.0, 3.1, and 3.1x)
Windows was developed in the United States of America, by an American company, Microsoft. Unicode was not even an understood term at the time, so it’s not surprising that none of the 16-bit Windows operating systems spoke Unicode. There were some forward-looking people at Microsoft, who were looking to international markets, notably markets such as Japan. This did cause Microsoft to begin thinking beyond codepage 1252 and into the Japanese and other Asian codepages. The expense of globalization and localization was not as well understood, but the requirements of the Japanese market were very clear, clear enough that Microsoft jumped in with the intention of localization.
However, localization at this stage was very primitive and involved U.S. developers “throwing their code over the fence” to the Japanese developers and then having them make the changes the product required and testing those changes. Unfortunately, those developers and testers made as many mistakes as their original counterparts, and these versions of the products were not supersets that could handle all languages, but subsets that just as often were broken for handling data that would work in the original version.
Thus localization existed, but globalization was relatively unknown. Because each locale simply kept to itself and cross-codepage interoperability was not a requirement, this caused no major problems for anyone.
COM in the 16-Bit World
COM did not really understand Unicode much better, for much the same reasons. In fact, the Component Object Model did not really embrace the importance of language/locale at all. It worked under the same assumption that it was good enough to work within the current system’s default codepage.
Visual Basic in the 16-Bit World
Early versions of Visual Basic worked under the same rules as other 16-bit applications. But, by the time Visual Basic 3.0 arrived, it was clear that a better way of dealing with software in general was needed to bring down the costs and improve the quality of products that needed to be sold in other countries. However, Visual Basic has always been very much based on the platform on which it sits, and without support from the foundation, the house is simply not going to be built in such a direction. So VB 3.0 stayed where it was, but people were looking forward, toward Windows New Technology, or Windows NT.
David Cutler’s second operating system (the first was for Digital) was built with the future in mind, and the important aspect for us is Windows NT’s full support for Unicode. The entire operating system was written from the ground up with the idea that a Unicode Kernel and Unicode support was the most important method of getting to the operating system. This was met with skepticism, as most people simply saw twice as big as a real problem in an era when memory and hard drive space were at such premiums, but Mr. Cutler really was looking to the future. The “Unicode” he chose was UCS-2, the worldwide standard encapsulated by ISO-16046 that defined every character in terms of two bytes. The US English locale still had the “positional” advantage of being in the first 127 characters, but it no longer had the advantage over the Asian languages of taking up less space.
At the same time, reality crept in, and it became obvious that no one could move to a platform that did not support the “old way” of doing things. Therefore, ANSI would be supported for the sake of backward compatibility, and to enable all the existing applications to keep working (mostly). All the Win32 APIs that took strings now had two versions: an “A” version for multibyte character systems such as English, Dutch, Japanese, , and a “W” version that would use Unicode. At compile time, you would choose which set of APIs to use by choosing whether to compile with the Unicode flag, for example, deciding whether the GetWindowLong call in your C/C++ code would be calling GetWindowLongA or GetWindowLongW. You could always choose to call one or the other explicitly, but you were encouraged not to. In theory, it would be easy for you to simply flip a switch one day and be in Unicode!
And why would they do this? Well, first, there was the obvious strength of a worldwide EXE (which even Windows NT did not yet have because many of its own core applications were not yet as enlightened as the Kernel-to the extent that a Kernel can be considered enlightened!). Second, any time you were dealing directly with Unicode, all your operations would be faster because no extra translations between ANSI and Unicode would be needed. People learned very quickly that the SDK documentation claims were accurate- MultiByteToWideChar and WideCharToMultiByte functions could slow down an application.
One of the hidden features of MultiByteToWideChar and WideCharToMultiByte was, of course, that codepage tables had to exist to assist in the conversion of strings between Unicode and any codepage, and vice versa. Supporting a given language in a world where most applications were not really using Unicode internally meant supporting the codepage, as well.
Windows 95 was born under slightly different principles: It was, in many ways, a port of the original 16-bit Win 3.x codebase (just as most of the applications were, even under Windows NT). It was originally intended to fully support the same Win32 API as Windows NT, although in most cases the “W” versions of the API functions simply return an ERROR_CALL_NOT_IMPLEMENTED or E_NOTIMPL error. The limited number of Win32 API calls designed to support both Unicode and ANSI under Windows 95/98 are seen in Table 6.1.
Table 6.1 The Win32 API Calls That Support Unicode Under All Platforms
What It Does
Enumerates the languages supported by a specified resource name/type in a given module
Enumerates all the resources of a specified type in a given module
Enumerates all the resource types in a given module
Writes a character string out to a specified location, optionally enabling parameters beyond what TextOut supports
Finds a resource of a specified name and title
Finds a resource of a specified name and title, allowing a language to be specified
Retrieves the width of specified characters
Retrieves the command line string for the current process
Computes the width and height of a given text string (provided for backward compatibility, GetTextExtentPoint32 is recommended)
Computes the width and height of a given text string
Returns the length of a null-terminated string
Creates, displays, and operates a message box.
Creates, displays, and operates a message box, allowing the user to specify a language for the predefined buttons
Converts a multibyte string to a Unicode one, given the codepage with which to do the conversion
Writes a character string out to a specified location
Converts a Unicode string to a multibyte one, given the codepage with which to do the conversion
Clearly, it can be challenging to write a Unicode application for Windows 95 or 98 given such a sparse set of tools. However, at first, the idea was that you would later simply compile as a Unicode application for Windows NT and be done with it; only later did the need to support Unicode applications on Windows 95/98 become clear.
Below is How to change time zone information by using…
COM in the 32-Bit World COM made an interesting break…
The question of whether Visual Basic is ANSI or Unicode…
Your email address will not be published.