Monthly Archives: September 2016

Measuring UTF-8 character size

Lets say I type something in Hindi and the outcome is listed below:

तीन व्याकितियों की औसत आयु ३३ वर्ष है.

In Hex View it will look like:

e0 a4 a4 e0 a5 80 e0 a4 a8 20 e0 a4 b5 e0 a5 8d e0 a4 af e0 a4 be e0 a4 95 e0 a4 bf e0 a4 a4 e0 a4 bf e0 a4 af e0 a5 8b e0 a4 82 20 e0 a4 95 e0 a5 80 20 e0 a4 94 e0 a4 b8 e0 a4 a4 20 e0 a4 86 e0 a4 af e0 a5 81 20 e0 a5 a9 e0 a5 a9 20 e0 a4 b5 e0 a4 b0 e0 a5 8d e0 a4 b7 20 e0 a4 b9 e0 a5 88 2e e0 a4 85 e0 a4 97 e0 a4 b0 20 e0 a4 89 e0 a4 a8 e0 a4 95 e0 a5 80 20 e0 a4 86 e0 a4 af e0 a5 81 20 e0 a5 a8 3a e0 a5 a9 3a e0 a5 aa 20 e0 a4 95 e0 a5 87 20 e0 a4 85 e0 a4 a8 e0 a5 81 e0 a4 aa e0 a4 be e0 a4 a4 20 e0 a4 ae e0 a5 87 e0 a4 82 20 e0 a4 b9 e0 a5 8b 2c e0 a4 a4 e0 a5 8b e0 a4 b9 20 e0 a4 89 e0 a4 a8 e0 a4 ae e0 a5 87 20 e0 a4 b8 e0 a5 87 20 e0 a4 b8 e0 a4 ac e0 a4 b8 e0 a5 87 20 e0 a4 ac e0 a5 9c e0 a5 87 20 e0 a4 95 e0 a5 80 20 e0 a4 86 e0 a4 af e0 a5 81 20 e0 a4 95 e0 a4 bf e0 a4 af e0 a5 8d e0 a4 a4 e0 a4 a8 e0 a5 87 20 e0 a4 b5 e0 a4 b0 e0 a5 8d e0 a4 b7 20 e0 a4 b9 e0 a5 8b e0 a4 97 e0 a5 80 3f

The main thing to notice here is that every hindi character is starting with the byte “E0”. This is basically a code point which identifies the code size of the UTF-8 character. The following table appropriate highlights it:

Binary    Hex          Comments
0xxxxxxx  0x00..0x7F   Only byte of a 1-byte character encoding
10xxxxxx  0x80..0xBF   Continuation bytes (1-3 continuation bytes)
110xxxxx  0xC0..0xDF   First byte of a 2-byte character encoding
1110xxxx  0xE0..0xEF   First byte of a 3-byte character encoding
11110xxx  0xF0..0xF4   First byte of a 4-byte character encoding



Embedding images directly into HTML

I needed to email an HTML report and I was not happy with the fact that we have to create zip file containing the HTML, Images etc. I knew it was possible to embed base64 data directly into HTML I started looking for some PoC. My search took me to an online base64 encoder and decoder. This gave me the idea to convert the external image files into base64 string and embed it directly into HTML. On some search I landed up Apache Commons Codec Library which contains a Base64 class which can be used to covert an input stream into a base64 string.

JSON to Java Code Generator

I recently received a couple of REST Web Service endpoints where the implementation technology was not Java. So I had to either rollout my own object and then rely on the object mapper to map the JSON to Java or come up with an approach to generate the relevant POJO file. On a little bit of looking around I found out the following two links for generating Java Code from JSON.

The POJO generated were decent and got the job done. I didn’t have to write the POJO myself and the code generated by these websites did the job well.

Another link recommended in the posts comments. (Thanks Daniel)

I found out the 1st two links are no longer working. So I found out about another tool based on Visual Studio Code. Use the following extension for solving this use case.

Delete Untracked File Using Git

Sometimes we want to clean out our Git workspace and remove any file which is untracked. A hard reset usually does the job for modified files as shown below:
git reset --hard

But hard reset doesn’t fit in case of untracked files. The following command will do the job:
git clean -d -fx ""

-x means ignored files are also removed as well as files unknown to git.

-d means remove untracked directories in addition to untracked files.

-f is required to force it to run.


Leveraging Tee and Grep for filtering program output

I have a utility which compiles my UI code and minifies them. It works pretty well and doesn’t requires much supervision. Trouble starts if the code compilation fails and it gets ignored in the huge log file which is generated by this utility. I use the following approach to filter out errors and yet preserving the output of the utility program for any diagnostic in case of any error.

./ | tee output.txt | grep error

Basically what happens is that “” generates a lot of output which is piped into the “tee” command which in turn dumps it into a text file named “output.txt” and then the same output is again piped into the “grep” command which looks for any error text in the output. If the error is found it is outputted on the console.

So the end result is that in 99% cases I don’t see any output from the utility as I don’t want to see debug output. However the moment an error happens it gets flagged on console and I have the “output.txt” file to audit what went wrong and the root cause of the error.

Configuring Git user name and email

Since I like working using terminal I prefer to do all my operations of git from the command prompt. The first prerequisite to this is to have my name and email configured. This few commands are required to do this simple operation:

To set the name and email:

git config --global "Your Name"
git config --global "Your Email"

To view the name and email which is known to git execute the same commands without any parameter as shown below:

git config --global
git config --global