Unicode

Unicode is an international character encoding standard. It provides a unique number (code point) for every character, no matter what the platform, program, or language is. Furthermore, it represents the most commonly used encoding today.

ASC…


This content originally appeared on DEV Community and was authored by Zoran Luledzija

Unicode is an international character encoding standard. It provides a unique number (code point) for every character, no matter what the platform, program, or language is. Furthermore, it represents the most commonly used encoding today.

ASCII

ASCII (American Standard Code for Information Interchange) is one of the first widely used character encoding standards. People from the telecommunication and computing industries in America created it during the 1960s. As a 7-bit coding system, it supported 128 (i.e. 2⁷) characters, 96 printing characters, and 32 control characters. That was sufficient to encode numbers, some special characters, and the letters of the English alphabet.

However, the spread of computing and the Internet has created a need for other characters as well. As computers used 8-bit bytes, some manufacturers decided to use the remaining 8th bit in the ASCII code and thus expand the number of characters to 256. This 8-bit encoding is often referred to as “Extended ASCII” or “8-bit ASCII“. With the growth of different 8-bit encoders, data exchange became complicated and error-prone. That was a sign that it was necessary to find some universal solution that would work for all languages and cover all the special characters.

Unicode

Unicode provides a unique code for every character, in every language, in every program, on every platform. It enables a single document to contain text from different writing systems, which was nearly impossible with earlier native encodings. Moreover, Unicode supports emojis, which are an indispensable part of communication today.

Unicode Transformation Formats

Unicode defines several transformation formats, also known as UTFs (Unicode Transformation Formats). These transformation formats define how each code is represented in bits in memory. Below is a brief overview of the three UTFs that Unicode Standard provides.

  • UTF-8
    • variable-length character encoding that uses from 1 to 4 bytes (from 8 to 32 bits)
    • backward compatible with ASCII
    • the most common encoding on the web (~98% of all web pages)
  • UTF-16
    • variable-length character encoding that uses 2 or 4 bytes (16 or 32 bits)
    • internally used by Microsoft Windows, Java, JavaScript, etc.
  • UTF-32
    • fixed length character encoding that uses 4 bytes (32 bits)
    • faster to operate but uses more memory and wastes a lot of bandwidth

Final thoughts

Thanks to Unicode, today's software runs on a variety of languages and platforms. That was hard to imagine a few decades ago. In other words, today's software localization would be impossible without such an encoding standard.

More details regarding Unicode you can find in the original post.


This content originally appeared on DEV Community and was authored by Zoran Luledzija


Print Share Comment Cite Upload Translate Updates
APA

Zoran Luledzija | Sciencx (2021-12-22T16:37:26+00:00) Unicode. Retrieved from https://www.scien.cx/2021/12/22/unicode/

MLA
" » Unicode." Zoran Luledzija | Sciencx - Wednesday December 22, 2021, https://www.scien.cx/2021/12/22/unicode/
HARVARD
Zoran Luledzija | Sciencx Wednesday December 22, 2021 » Unicode., viewed ,<https://www.scien.cx/2021/12/22/unicode/>
VANCOUVER
Zoran Luledzija | Sciencx - » Unicode. [Internet]. [Accessed ]. Available from: https://www.scien.cx/2021/12/22/unicode/
CHICAGO
" » Unicode." Zoran Luledzija | Sciencx - Accessed . https://www.scien.cx/2021/12/22/unicode/
IEEE
" » Unicode." Zoran Luledzija | Sciencx [Online]. Available: https://www.scien.cx/2021/12/22/unicode/. [Accessed: ]
rf:citation
» Unicode | Zoran Luledzija | Sciencx | https://www.scien.cx/2021/12/22/unicode/ |

Please log in to upload a file.




There are no updates yet.
Click the Upload button above to add an update.

You must be logged in to translate posts. Please log in or register.