What you need to know about character sets and encoding schemes in Morfik

In an application or website built with Morfik there are mainly three types of character encoding that you may have to deal with and that you need to be aware of:

  • Encoding used to send HTML into the browser;
  • Encoding used to store data in the database;
  • Encoding used for interfacing with the server operating system.


HTML and Database Encoding

By default a Morfik built application will use the UTF-8 encoding for both the HTML and the project's default, built-in, database. At this time, these encodings are fixed and cannot be changed.

External Database Encoding

A Morfik application can access external databases that use a different encoding than that of the project itself. When retrieving information through the use of Visual elements such as forms, there is no need to worry about the encoding as the Morfik Framework takes care of any necessary conversions.

When you are writing database access code in a Web Method, for example, you need to be aware of the difference in encoding and treat the information accordingly. The ChangeEncoding function can be used to convert from one charset to another. The following code snippet shows the function header declaration for ChangeEncoding.

Function  ChangeEncoding(Const S, FromCharset, ToCharset : String) : String;

Server Side Programming and Interfacing with Windows

Microsoft Windows uses UTF-16 encoding for all its native string handling. This means that when you are going to pass an application string as a parameter to a Windows API function you must be aware of the difference in encoding and convert the string prior to handing it to Windows. Likewise, when you receive a string from the Windows API you should reverse the conversion to change the encoding from Windows' UTF-16 to UTF-8, or whatever other encoding you may have selected for your application.

In order to convert from UTF-16 to UTF-8 you should use the Utf8ToWideString function and for the reverse operation you should use WiderStringToUtf8.

Third-party web methods

Since you cannot control the messages returned by third-party code, there may be a need to place a wrapper code around the calls, and provide the translation based on the status code, when possible, or based on the actual message text.

It is not uncommon for different systems, specially those that were not designed from the start to be accessed by third parties, to be using different encoding. This is specially true when you are interfacing a Morfik-built server with a client built with a desktop development tool. In many cases these tools do use standards compliant encoding by default.

Summary

Whenever you manually write code that handles data exchange between the browser and the server or the server portion of the application and the native operating system you should be aware of which encoding is used and convert to another encoding if needed.

Related Topics