Author Archives: cyberaka

About cyberaka

I am an experienced Senior Solution Architect with proven history of designing robust and highly available Java based solutions. I am skilled in architecting designing and developing scalable, highly available, fault tolerant and concurrent systems which can serve high volume traffic. I have hands on experience in designing RESTful MicroServices architecture using Spring Boot, Spring Cloud, MongoDB, Java 8, Redis and Kafka. I like TDD (JUnit), BDD (Cucumber) and DDD based development as required for my projects. I have used AWS primarily for cloud based deployment and I like developing cloud enabled POC as a hobby in my spare time. I have deigned and developed CI/CD pipeline using Jenkins and leveraged Docker, Kubernetes for containerizing and deploying some of my applications. I am highly experienced in creating high performing technical teams from scratch. As an ex-entrepreneur I am very much involved in the business side of the IT industry. I love interacting with clients to understand their requirements and get the job done.

MySQL Slowdown With Large Inserts

On a plain vanilla windows system with approximately 6 GB of RAM and a XAMPP based MySQL installation I found out that as the number of inserts increased the MySQL inserts became slower and slower. Ultimately it came down to one insert in 2 seconds!

This was totally unacceptable so I looked around for some solution to this problem and I found one here.

Based on the article above I started looking into the MySQL configuration and I found the following entries:

#innodb_log_arch_dir = "D:/xampp/mysql/data"
## You can set .._buffer_pool_size up to 50 - 80 %
## of RAM but beware of setting memory usage too high
innodb_buffer_pool_size = 16M
innodb_additional_mem_pool_size = 2M
## Set .._log_file_size to 25 % of buffer pool size
innodb_log_file_size = 5M
innodb_log_buffer_size = 8M
innodb_flush_log_at_trx_commit = 1
innodb_lock_wait_timeout = 50

So my database is using innodb file system but the buffer pool size is only 16 MB with additional increase of 2 MB. This in my opinion is too less keeping in mind the comments provided in the file.

## You can set .._buffer_pool_size up to 50 - 80 %
## of RAM but beware of setting memory usage too high

So in a 6 GB RAM system I could easily set 5 GB as the buffer pool size but it is a development system and I wanted the developer to have decent development performance as well. So I tweaked the configuration a little bit and now it looks like this:

#innodb_log_arch_dir = "D:/xampp/mysql/data"
## You can set .._buffer_pool_size up to 50 - 80 %
## of RAM but beware of setting memory usage too high
innodb_buffer_pool_size = 1000M
innodb_additional_mem_pool_size = 250M
## Set .._log_file_size to 25 % of buffer pool size
innodb_log_file_size = 50M
innodb_log_buffer_size = 80M
innodb_flush_log_at_trx_commit = 1
innodb_lock_wait_timeout = 50

At the moment the innodb_buffer_pool_size is 1000 MB with an additional innodb_additional_mem_pool_size of 250 MB.

I also increased the innodb_log_file_size to 50 MB and innodb_log_buffer_size to 80 MB (I simply multiplied the default value with 10).

I have started the inserts again into the same table using the same program with no other changes and I see a marked improvement in performance. The insert operation has not yet slowed down. I am able to insert 10-14 records within one second.

Update after 3 hours =====
I observed that after inserting close to 110000 records the inserts slowed down to 4-5 inserts per second. I have read an article which summarizes how InnoDB actually works. I think I will tweaking the configuration a little bit more.

Searching for file containing keyword in Linux

Grep suits the bill for all my requirement for efficient search of files containing text in Linux and Mac. The following commands detail the use cases of grep.

Search for pattern
grep -rnw 'folder' -e 'text'

-r stands for recursive.
-n is the line number.
-w stands for whole word match.

Example: grep -rnw . -e 'import'
This searches for the the text ‘import’ in the current directory recursively.

Found this tip at stackoverflow.com.

Debugging HTTP Traffic in Mac

I use Fiddler extensively in Windows for debugging any HTTP traffic in my web applications. However on Mac Fiddler is not available. On some search I have found out the following tools which can do the job:

Chrome
In Chrome just type “chrome://net-internals/#http2” in the address bar and you will be able to see all the HTTP traffic that is going on in your system. Not sure when it got added to Chrome but it is a very simple yet powerful utility.

Charles Proxy
This is Java based commercial utility which can be used for almost Fiddler like functionality. This works in Windows, Linux and Mac so it is good deal I think for any developer.

Update: On using Charles Proxy I found it to be dead simple to use and it fulfilled all the requirements I had on my Mac. There is 30 minute lockout feature for unlicensed version which seems fair. It is completely worth the 50$ price tag.

Using grep, sed to filter data in Linux, Mac OS

I recently received a blob of data which needed some alteration to properly suit the CSV format. I found that the data file ‘xyz.dat’ contained close to 600,000 rows so writing a VBA script in Excel to format the data was not an option. I opted to do it using some plain commands in Unix. I had to attain the following purpose:

1. Remove all rows with blank lines.
2. Remove all rows that didn’t start with the number ‘1’
3. Trim the file to 1000 rows to sample the data.

I executed the following commands to attain the above objectives.

To remove all blank lines I used this command. I am basically asking grep to treat the text as ASCII text and using a regular expression to identify blank rows:
cat xyz.dat | grep -a -v -e '^[[:space:]]*$' > xyz_no_space.dat

To remove all rows that don’t start with ‘1’ I used this command. This command is similar to above command except for the regular expression to identify rows that start with ‘1’.
cat xyz_no_space.dat | grep -a '^1' > xyz_no_space_start_with_1.dat

At this point we have the data we want but for sampling purpose I need to copy first 1000 rows into another file. This command uses ‘sed’ to do it.
sed -n -e '1,1000p' xyz_no_space_start_with_1.dat > xyz_trimmed_1_1000.dat

This command basically renames the *.dat file into *.csv for convenience.
mv xyz_trimmed_1_1000.dat xyz_trimmed_1_1000.csv

A summary of all commands I executed is listed below:
cat xyz.dat | grep -a -v -e '^[[:space:]]*$' > xyz_no_space.dat
cat xyz_no_space.dat | grep -a '^1' > xyz_no_space_start_with_1.dat
sed -n -e '1,1000p' xyz_no_space_start_with_1.dat > xyz_trimmed_1_1000.dat
mv xyz_trimmed_1_1000.dat xyz_trimmed_1_1000.csv