Considerations on the database schema for multiple language support

preliminarystamp.jpg Pre-release information on the latest software features in Morfik. Please note that this is preliminary documentation and subject to change.

Localizing database-driven content is not an easy task, and there is not much choice when it comes to approaches that can be taken. Consider the simple Blog application which uses a very basic table with only two fields: PostDate and Content.

Let’s quickly go through the existing methods of localizing that table:

Using different columns

In this method, a separate field is used to store a localized version of the string in the same row. For example, if we decide to provide a Spanish version of our blog, we need to add another column Content_es.

This method is relatively simple to implement. The biggest issue with it is that you will have to restructure your table every time you add a new language. You will also have to be careful not to exceed the maximum row size limit that many databases have.

Using different rows

To allow adding languages without the need to restructure your database you can consider using a language identifier as a part of a key field. You will still need to add an additional field to your table, but it is done only once. In our example, the table will now have three fields: Language, PostDate, and Content.

While it avoids the need for restructuring, this method is far from perfect. The content of the row that is not localizable gets duplicated for every language. The implementation is not as simple: all the queries need to be updated to include additional filtering on language, falling back to the default value in case the translation isn’t found might require another query, and maintaining referential integrity or any kind of relationship between tables could become problematic if you have to duplicate the rows. And if the data in the table is being updated, you will have to take care of keeping multiple rows in sync.

Using different tables

Yet another approach is to use different tables for multiple languages. The benefit of this method is that no structure changes are required. Effectively, every localized table is just a clone of the original one. Using different tables does have its share of issues similar to those existing with the previous method. Also, any changes to the table structure will have to be applied to every clone.

Using different databases

The most extreme method is to have different databases altogether. The code changes are minimal, since it all becomes just a matter of connecting to the right database. However, maintaining multiple database connections could cause some performance issues, and data duplication is at its maximum with this approach. Synchronizing the changes becomes really problematic, and database maintenance in general becomes complicated. All in all this approach is not very practical.

Related Topics