Measuring UTF-8 character size

Lets say I type something in Hindi and the outcome is listed below:

तीन व्याकितियों की औसत आयु ३३ वर्ष है.

In Hex View it will look like:

e0 a4 a4 e0 a5 80 e0 a4 a8 20 e0 a4 b5 e0 a5 8d e0 a4 af e0 a4 be e0 a4 95 e0 a4 bf e0 a4 a4 e0 a4 bf e0 a4 af e0 a5 8b e0 a4 82 20 e0 a4 95 e0 a5 80 20 e0 a4 94 e0 a4 b8 e0 a4 a4 20 e0 a4 86 e0 a4 af e0 a5 81 20 e0 a5 a9 e0 a5 a9 20 e0 a4 b5 e0 a4 b0 e0 a5 8d e0 a4 b7 20 e0 a4 b9 e0 a5 88 2e e0 a4 85 e0 a4 97 e0 a4 b0 20 e0 a4 89 e0 a4 a8 e0 a4 95 e0 a5 80 20 e0 a4 86 e0 a4 af e0 a5 81 20 e0 a5 a8 3a e0 a5 a9 3a e0 a5 aa 20 e0 a4 95 e0 a5 87 20 e0 a4 85 e0 a4 a8 e0 a5 81 e0 a4 aa e0 a4 be e0 a4 a4 20 e0 a4 ae e0 a5 87 e0 a4 82 20 e0 a4 b9 e0 a5 8b 2c e0 a4 a4 e0 a5 8b e0 a4 b9 20 e0 a4 89 e0 a4 a8 e0 a4 ae e0 a5 87 20 e0 a4 b8 e0 a5 87 20 e0 a4 b8 e0 a4 ac e0 a4 b8 e0 a5 87 20 e0 a4 ac e0 a5 9c e0 a5 87 20 e0 a4 95 e0 a5 80 20 e0 a4 86 e0 a4 af e0 a5 81 20 e0 a4 95 e0 a4 bf e0 a4 af e0 a5 8d e0 a4 a4 e0 a4 a8 e0 a5 87 20 e0 a4 b5 e0 a4 b0 e0 a5 8d e0 a4 b7 20 e0 a4 b9 e0 a5 8b e0 a4 97 e0 a5 80 3f

The main thing to notice here is that every hindi character is starting with the byte “E0”. This is basically a code point which identifies the code size of the UTF-8 character. The following table appropriate highlights it:

Binary    Hex          Comments
0xxxxxxx  0x00..0x7F   Only byte of a 1-byte character encoding
10xxxxxx  0x80..0xBF   Continuation bytes (1-3 continuation bytes)
110xxxxx  0xC0..0xDF   First byte of a 2-byte character encoding
1110xxxx  0xE0..0xEF   First byte of a 3-byte character encoding
11110xxx  0xF0..0xF4   First byte of a 4-byte character encoding



Embedding images directly into HTML

I needed to email an HTML report and I was not happy with the fact that we have to create zip file containing the HTML, Images etc. I knew it was possible to embed base64 data directly into HTML I started looking for some PoC. My search took me to an online base64 encoder and decoder. This gave me the idea to convert the external image files into base64 string and embed it directly into HTML. On some search I landed up Apache Commons Codec Library which contains a Base64 class which can be used to covert an input stream into a base64 string.

JSON to Java Code Generator

I recently received a couple of REST Web Service endpoints where the implementation technology was not Java. So I had to either rollout my own object and then rely on the object mapper to map the JSON to Java or come up with an approach to generate the relevant POJO file. On a little bit of looking around I found out the following two links for generating Java Code from JSON.

The POJO generated were decent and got the job done. I didn’t have to write the POJO myself and the code generated by these websites did the job well.

Delete Untracked File Using Git

Sometimes we want to clean out our Git workspace and remove any file which is untracked. A hard reset usually does the job for modified files as shown below:
git reset --hard

But hard reset doesn’t fit in case of untracked files. The following command will do the job:
git clean -d -fx ""

-x means ignored files are also removed as well as files unknown to git.

-d means remove untracked directories in addition to untracked files.

-f is required to force it to run.


Leveraging Tee and Grep for filtering program output

I have a utility which compiles my UI code and minifies them. It works pretty well and doesn’t requires much supervision. Trouble starts if the code compilation fails and it gets ignored in the huge log file which is generated by this utility. I use the following approach to filter out errors and yet preserving the output of the utility program for any diagnostic in case of any error.

./ | tee output.txt | grep error

Basically what happens is that “” generates a lot of output which is piped into the “tee” command which in turn dumps it into a text file named “output.txt” and then the same output is again piped into the “grep” command which looks for any error text in the output. If the error is found it is outputted on the console.

So the end result is that in 99% cases I don’t see any output from the utility as I don’t want to see debug output. However the moment an error happens it gets flagged on console and I have the “output.txt” file to audit what went wrong and the root cause of the error.

Configuring Git user name and email

Since I like working using terminal I prefer to do all my operations of git from the command prompt. The first prerequisite to this is to have my name and email configured. This few commands are required to do this simple operation:

To set the name and email:

git config --global "Your Name"
git config --global "Your Email"

To view the name and email which is known to git execute the same commands without any parameter as shown below:

git config --global
git config --global

Finding class inside a bunch of jar

Many times there are some Java linkage errors and I have to find out in which jar files the class files are located. So this has lead me to find out tools which can do this job for me. I usually get the job done by using this excellent open source tools named Jar Explorer inside Github. It is basically a platform independent Swing based utility which allows you to recursively search inside Jar files located inside a folder for any class name String. So it is possible for me to search for a class named “LoggingEvent” inside a folder containing lots of jars and it outputs the list of all the jar files where it found classes containing the text “LoggingEvent”.

However when you are connected to Linux consoles using ssh and don’t have access to X Windowing system then you have to rely on either text based java program or pure vanilla shell scripting. For this situation I use the following snippets of code which I found from a Stackoverflow article.

On Linux/Mac
for i in *.jar; do jar -tvf "$i" | grep -Hsi ClassName && echo "$i"; done

On Windows
for /R %G in (*.jar) do @jar -tvf "%G" | find "ClassName" > NUL && echo %G

Using tput to label tmux panes

As I had pointed out in my post about tmux that now I am using tmux to configure and debug multiple servers inside a single window split into panes. Now I ran to another problem I always forgot which pane was meant for which service. Beyond 2 to 3 panes it was getting confusing to remember which pane is monitoring which service. So I remembered my previous post related to tput which allows anybody to show a running clock inside a linux terminal. So I decided to provision a small shell script to fix this issue. Basically I wanted a way to label each pane so that I could effortlessly identify the purpose of the pane inside tmux. So here is the code:

#Display Service Name

function die {
echo "Dying on signal $1"
exit 0

function redraw {
local width length;
width=$(tput cols);
tput sc;
tput cup 0 $((width-length));
set_foreground=$(tput setaf 7)
set_background=$(tput setab 1)
echo -n $set_background$set_foreground
printf ' Service:%s ' $str
tput sgr0;
tput rc;

trap 'die "SIGINT"' SIGINT
trap 'die "SIGQUIT"' SIGQUIT

trap redraw WINCH;

while true; do
redraw $*;
sleep 1;

This shell script takes a parameter and shows it on the top right column in a red background with white foreground. This script should be invoked in this way.

./ service-name &

Invoking this script in every tmux pane with relevant substitution for service-name is giving me this result:
Screen Shot 2016-02-11 at 1.59.55 pm

Please note that this script works well on my MacBook. I am yet to test it on a Linux terminal.