{"id":526,"date":"2016-09-25T12:15:49","date_gmt":"2016-09-25T06:45:49","guid":{"rendered":"http:\/\/www.cyberaka.com\/?p=526"},"modified":"2016-09-25T12:15:49","modified_gmt":"2016-09-25T06:45:49","slug":"measuring-utf-8-character-size","status":"publish","type":"post","link":"https:\/\/www.cyberaka.com\/?p=526","title":{"rendered":"Measuring UTF-8 character size"},"content":{"rendered":"<p>Lets say I type something in Hindi and the outcome is listed below:<\/p>\n<p style=\"padding-left: 30px;\">\u0924\u0940\u0928 \u0935\u094d\u092f\u093e\u0915\u093f\u0924\u093f\u092f\u094b\u0902 \u0915\u0940 \u0914\u0938\u0924 \u0906\u092f\u0941 \u0969\u0969 \u0935\u0930\u094d\u0937 \u0939\u0948.<\/p>\n<p>In Hex View it will look like:<\/p>\n<p>e0 a4 a4 e0 a5 80 e0 a4 a8 20 e0 a4 b5 e0 a5 8d e0 a4 af e0 a4 be e0 a4 95 e0 a4 bf e0 a4 a4 e0 a4 bf e0 a4 af e0 a5 8b e0 a4 82 20 e0 a4 95 e0 a5 80 20 e0 a4 94 e0 a4 b8 e0 a4 a4 20 e0 a4 86 e0 a4 af e0 a5 81 20 e0 a5 a9 e0 a5 a9 20 e0 a4 b5 e0 a4 b0 e0 a5 8d e0 a4 b7 20 e0 a4 b9 e0 a5 88 2e e0 a4 85 e0 a4 97 e0 a4 b0 20 e0 a4 89 e0 a4 a8 e0 a4 95 e0 a5 80 20 e0 a4 86 e0 a4 af e0 a5 81 20 e0 a5 a8 3a e0 a5 a9 3a e0 a5 aa 20 e0 a4 95 e0 a5 87 20 e0 a4 85 e0 a4 a8 e0 a5 81 e0 a4 aa e0 a4 be e0 a4 a4 20 e0 a4 ae e0 a5 87 e0 a4 82 20 e0 a4 b9 e0 a5 8b 2c e0 a4 a4 e0 a5 8b e0 a4 b9 20 e0 a4 89 e0 a4 a8 e0 a4 ae e0 a5 87 20 e0 a4 b8 e0 a5 87 20 e0 a4 b8 e0 a4 ac e0 a4 b8 e0 a5 87 20 e0 a4 ac e0 a5 9c e0 a5 87 20 e0 a4 95 e0 a5 80 20 e0 a4 86 e0 a4 af e0 a5 81 20 e0 a4 95 e0 a4 bf e0 a4 af e0 a5 8d e0 a4 a4 e0 a4 a8 e0 a5 87 20 e0 a4 b5 e0 a4 b0 e0 a5 8d e0 a4 b7 20 e0 a4 b9 e0 a5 8b e0 a4 97 e0 a5 80 3f<\/p>\n<p>The main thing to notice here is that every hindi character is starting with the byte &#8220;E0&#8221;. This is basically a code point which identifies the code size of the UTF-8 character. The following table appropriate highlights it:<\/p>\n<pre><code>Binary    Hex          Comments\r\n0xxxxxxx  0x00..0x7F   Only byte of a 1-byte character encoding\r\n10xxxxxx  0x80..0xBF   Continuation bytes (1-3 continuation bytes)\r\n110xxxxx  0xC0..0xDF   First byte of a 2-byte character encoding\r\n1110xxxx  0xE0..0xEF   First byte of a 3-byte character encoding\r\n11110xxx  0xF0..0xF4   First byte of a 4-byte character encoding\r\n<\/code><\/pre>\n<p>Reference: https:\/\/stackoverflow.com\/questions\/5290182\/how-many-bytes-does-one-unicode-character-take\/33349765#33349765<\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Lets say I type something in Hindi and the outcome is listed below: \u0924\u0940\u0928 \u0935\u094d\u092f\u093e\u0915\u093f\u0924\u093f\u092f\u094b\u0902 \u0915\u0940 \u0914\u0938\u0924 \u0906\u092f\u0941 \u0969\u0969 \u0935\u0930\u094d\u0937 \u0939\u0948. In Hex View it will look like: e0 a4 a4 e0 a5 80 e0 a4 a8 20 e0 a4 b5 e0 a5 8d e0 a4 af e0 a4 be e0 a4 95 e0 a4 [&hellip;]<\/p>\n","protected":false},"author":4,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[23],"tags":[],"class_list":["post-526","post","type-post","status-publish","format-standard","hentry","category-programming"],"_links":{"self":[{"href":"https:\/\/www.cyberaka.com\/index.php?rest_route=\/wp\/v2\/posts\/526","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.cyberaka.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.cyberaka.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.cyberaka.com\/index.php?rest_route=\/wp\/v2\/users\/4"}],"replies":[{"embeddable":true,"href":"https:\/\/www.cyberaka.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=526"}],"version-history":[{"count":1,"href":"https:\/\/www.cyberaka.com\/index.php?rest_route=\/wp\/v2\/posts\/526\/revisions"}],"predecessor-version":[{"id":527,"href":"https:\/\/www.cyberaka.com\/index.php?rest_route=\/wp\/v2\/posts\/526\/revisions\/527"}],"wp:attachment":[{"href":"https:\/\/www.cyberaka.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=526"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.cyberaka.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=526"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.cyberaka.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=526"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}