Monday, July 24, 2006

Improving performance by changing system bottlenecks

We were in the process of trying to tune some code that has been around for about 2-3 years. The system is pretty straight forward. We get a bunch of files for each country. We read and process file in 3 groups - pre, main and post process where the processing done in each group is dependent on some processing in the previous group. In the post process, we read data that was written into the DB in the pre and main process and the do some process on it and then write it back to the DB. The reason that we read from DB is that the amount of data that is processed in pre and main is huge and cannot be kept in memory till the post process is triggered.

We already we using threads to run the sub processes in each process(pre, main & post) in parallel unless there were any dependencies. We were using connection pools and object pools for heavy objects. We were at a loss to figure out what more can be squeezed out. We started questioning our flow..

This is the way our simplified conversation would look like.., but these questions were raised over a period of 2-3 days and not on same day.

Q: Which process takes most time
A: Post process takes as much time as pre and main combined.

Q: How much time does the value add process time in post process take
A: Reading/Writing of Data takes 98% of time and processing the data takes 2% of the time

Q: Why did we have to goto the DB in the post process to read data that we wrote into DB in the same JVM in pre and main process?
A: Coz the amount of data if held in memory would increase heap usage by close to 900MB

Q: What do we need to not read data from the DB
A: A good cache manager that persists data if heap usage increases and whose read time is better than a DB read

Q: Will things like JCache etc work?
A: They will but the keys are not single objects but are queries that have 2-3 where clause entries.

Q: Why not write our own cache implementation
A: Get a life !

Q: What is the distribution of reads/writes
A: For every 7 reads we do one write

Q: Why are reads so slow?
A: Coz its a network call u bozo

Q: Will having the DB in the same box as java process help?
A: Might, but mostly might not since we still have to go thru all 7 network layer plus the unix box is connected to DB box via a 100mbs dedicated link

Q: How do u get to remove the 7 newtwork layers involved
A: Only if u put the DB process into the Java process

.. and then it dawned on us to think if using an in-memory database would remove this bottleneck. We then decided to cache all data using an in-memory database and read data from that in the post process to speed up whole process.

Then we again since there is no need for us to hit the DB, what more can be pruned off ?

Q: Dude why do we need the post process, can u tell me once again?
A: To add data from main and pre process into DB

Q: Why cant we do it in main itself ?
A: Hmm..historically it was never so.... but i think it makes sense too...but we need some data from pre process to be mixed with some data from main block so thats the reason i suppose

Q: Cant we have the pre block data mix with Main block data in main/pre blocks itself?
A: On yeah, only if u want to read the same file in both blocks

Q: How long does it take to read the file to get the data in pre block?
A: Hmm...not more than 20-30 seconds i suppose it should be ok to read the file multiple times without any performance issues

Q: Do u still neeed the post process block?
A: Hmm...Most of the data mixing functionality can done in Main/Pre blocks by parsing some files in both pre and main blocks. This way we dont have to re-query the DB for data in post block and that'll save us running around 10K queries. But still some processes need to be present in post block but they are light weight processes

Q: Hmm...we still block moving from each process block like pre to main etc waiting for the oracle queries to complete execution.
A: Hmm..thats interesting...since we are having an in-memory db the inserts to that DB are nearly 3-4X times faster than Oracle inserts. So why not block on the in-memory queries to complete let the oracle inserts go on in the background. We can use a jdk5 concurrent pool to run the oracle queries in the back ground and let the JVM terminate when all the futures(java.util.concurrent.Future) are done.

A: You better stop head's spinning....argghhh !

FYI we used Derby DB from apache as our in-memory db to speed up the process. We used two thread pools one which ran the insert statements into the In-Memory Derby DB and another that inserted into the Oracle DB. We ended up parsing a 10MB xml file twice but parsing using SAX only took around 20 seconds of our processing time so it was no biggie. Also contrary to popular belief running two queries did not degrade performance since we were not blocking on the DB that took a lot of time to execute. So we re-arranged our entire set of bottlenecks such that we only waited for the oracle inserts to complete when we were ready to end the process and shut down the JVM.

So what was my learning from the entire experience ...

Any performance improvement process should consist of following steps

1) Use a good profiler to profile both memory and time spent in each module
2) Identify the bottleneck processes
3) Run non-dependent processes in parallel
4) Question the flow of the process and see how a dependent process can be made into a non dependent process. In case it cannot break the dependency into small pieces so that a process is waiting for the smaller dependent task to run instead of the bigger task.
5) Do 3 and 4 again once the code is stabilized and ur still not satisfied.

Subscribe to comments for this post

Clicky Web Analytics