Incorrect documentation around Unicode


#1

Hi folks

The comments on Unicode vs UTF-8 on http://www.aerospike.com/docs/client/java/usage/data_type.html are incorrect. Java and .NET use UTF-16 strings.

http://www.joelonsoftware.com/articles/Unicode.html and http://kunststube.net/encoding/ explain the difference between Unicode and {UTF-8, UTF-16} better than I could.


#2

The reference in the documentation:

For example, an Aerospike String is internally stored in UTF-8 format

refers to how the string is being stored in the database.


#3

Sorry, I should have been more specific I was referring to this:

For example, an Aerospike String is internally stored in UTF-8 format. This allows Java and C# – which both use Unicode preferentially – to interact transparently with Python and Ruby (which use UTF-8) and C (which does not have a standard internal character encoding).

This doesn’t make sense, Unicode is not an alternative to UTF-8. Java and C# use UTF-16 strings internally. UTF-8 and UTF-16 both use Unicode. Unicode is a character set, UTF-8 and UTF-16 are both encodings of that character set.


#4

Got it. Thanks. Will enhance the documentation.