Saturday, December 30, 2006

Memory Issues using Java ThreadPoolExecutors

Java ThreadPoolExecutors are a very conventient way to make your applications as concurrent. We recently began a drive to refactor most of our code so that we make use of these ThreadPools. We had a particular process where we read XML files from filesystem and did some transformations on the XML and write the transformed XML into the DB. When we started testing the initial code, we ran into serious OutOfMemoryErrors for even a few hundred XML files.

This was a serious drawback. I looked at our code and found we had set our pool size as 15 and had a blocking input of 500 which meant only 515 xml files are meant to be in memory at any given point in time. This was puzzling since this ideally should not max out memory in a 1.5 GB heap.

Roughly our process was like

XML File --> Callable --> Thread Pool (insert into Db) --> Return boolean success

The pool took a Callable that held a reference to xml and wrote it into DB.

On further analysing the code the only thing that was suspicious was an innocuous looking ArrayList. This List held all the future objects so that we could iterate thru this list and wait for all the inputs to be processed before terminating the process. Why would a List of Future objects cause issues?

To identify the root cause I looked into the JDK ThreadPoolExecutor implementation and found the following

1) When we create a Callable task and submit it to the ThreadPoolExecutor using any of the submit methods, a FutureTask object containing the callable is created. This is returned as return value of the submit method.

2) This FutureTask Object is a concrete class implementing both Future and Runnable. This object is the one that is submitted into the ThreadPoolExecutor. The ThreadPoolExecutor never takes a Callable task directly.

3) The ThreadPoolExecutor when its ready to execute a new task picks up the task (a FutureObject or a Runnable) and calls the run method on it.

4) The FutureTask object has stored the callable object as its instance variable. The run method calls the call method and the result returned is set into an instance variable on the FutureTask. At this point both the Callable object and the returned value from the Callable object are both instance variables in the Future object.

5) The Callable and its results are never set to null in the Future.

So since all the Callable objects were in memory, and each callable maintained a reference to DOM object we maxed out on memory ! So we came up with a set of rules when using Callable to make life simpler.

Rules when using Callables and Futures :

A) Never maintain a huge state in Callable. The state variables will not be explicitly GC'ed as long as a reference to the Future is held.
B) If you need to have a lot of state in Callable, ensure that you clean them up at the end of the call method.
C) Never hang on to the Future indiscriminately. This will prevent the Callable and its Return value from being GC'ed.

I dont understand why FutureTask needs to hold on to Callable forever. Why can't the executing thread on completion set a variable called result in Future and nullify the reference to callable ? I dont have an answer yet but this sounds logical to me. Can someone please educate me?

Subscribe to comments for this post

Monday, July 24, 2006

Improving performance by changing system bottlenecks

We were in the process of trying to tune some code that has been around for about 2-3 years. The system is pretty straight forward. We get a bunch of files for each country. We read and process file in 3 groups - pre, main and post process where the processing done in each group is dependent on some processing in the previous group. In the post process, we read data that was written into the DB in the pre and main process and the do some process on it and then write it back to the DB. The reason that we read from DB is that the amount of data that is processed in pre and main is huge and cannot be kept in memory till the post process is triggered.

We already we using threads to run the sub processes in each process(pre, main & post) in parallel unless there were any dependencies. We were using connection pools and object pools for heavy objects. We were at a loss to figure out what more can be squeezed out. We started questioning our flow..

This is the way our simplified conversation would look like.., but these questions were raised over a period of 2-3 days and not on same day.

Q: Which process takes most time
A: Post process takes as much time as pre and main combined.

Q: How much time does the value add process time in post process take
A: Reading/Writing of Data takes 98% of time and processing the data takes 2% of the time

Q: Why did we have to goto the DB in the post process to read data that we wrote into DB in the same JVM in pre and main process?
A: Coz the amount of data if held in memory would increase heap usage by close to 900MB

Q: What do we need to not read data from the DB
A: A good cache manager that persists data if heap usage increases and whose read time is better than a DB read

Q: Will things like JCache etc work?
A: They will but the keys are not single objects but are queries that have 2-3 where clause entries.

Q: Why not write our own cache implementation
A: Get a life !

Q: What is the distribution of reads/writes
A: For every 7 reads we do one write

Q: Why are reads so slow?
A: Coz its a network call u bozo

Q: Will having the DB in the same box as java process help?
A: Might, but mostly might not since we still have to go thru all 7 network layer plus the unix box is connected to DB box via a 100mbs dedicated link

Q: How do u get to remove the 7 newtwork layers involved
A: Only if u put the DB process into the Java process

.. and then it dawned on us to think if using an in-memory database would remove this bottleneck. We then decided to cache all data using an in-memory database and read data from that in the post process to speed up whole process.

Then we again thought...now since there is no need for us to hit the DB, what more can be pruned off ?


Q: Dude why do we need the post process, can u tell me once again?
A: To add data from main and pre process into DB

Q: Why cant we do it in main itself ?
A: Hmm..historically it was never so.... but i think it makes sense too...but we need some data from pre process to be mixed with some data from main block so thats the reason i suppose

Q: Cant we have the pre block data mix with Main block data in main/pre blocks itself?
A: On yeah, only if u want to read the same file in both blocks

Q: How long does it take to read the file to get the data in pre block?
A: Hmm...not more than 20-30 seconds max..so i suppose it should be ok to read the file multiple times without any performance issues

Q: Do u still neeed the post process block?
A: Hmm...Most of the data mixing functionality can done in Main/Pre blocks by parsing some files in both pre and main blocks. This way we dont have to re-query the DB for data in post block and that'll save us running around 10K queries. But still some processes need to be present in post block but they are light weight processes

Q: Hmm...we still block moving from each process block like pre to main etc waiting for the oracle queries to complete execution.
A: Hmm..thats interesting...since we are having an in-memory db the inserts to that DB are nearly 3-4X times faster than Oracle inserts. So why not block on the in-memory queries to complete let the oracle inserts go on in the background. We can use a jdk5 concurrent pool to run the oracle queries in the back ground and let the JVM terminate when all the futures(java.util.concurrent.Future) are done.

A: You better stop now...my head's spinning....argghhh !

FYI we used Derby DB from apache as our in-memory db to speed up the process. We used two thread pools one which ran the insert statements into the In-Memory Derby DB and another that inserted into the Oracle DB. We ended up parsing a 10MB xml file twice but parsing using SAX only took around 20 seconds of our processing time so it was no biggie. Also contrary to popular belief running two queries did not degrade performance since we were not blocking on the DB that took a lot of time to execute. So we re-arranged our entire set of bottlenecks such that we only waited for the oracle inserts to complete when we were ready to end the process and shut down the JVM.

So what was my learning from the entire experience ...

Any performance improvement process should consist of following steps

1) Use a good profiler to profile both memory and time spent in each module
2) Identify the bottleneck processes
3) Run non-dependent processes in parallel
4) Question the flow of the process and see how a dependent process can be made into a non dependent process. In case it cannot break the dependency into small pieces so that a process is waiting for the smaller dependent task to run instead of the bigger task.
5) Do 3 and 4 again once the code is stabilized and ur still not satisfied.

Subscribe to comments for this post

Thursday, May 11, 2006

Why Java pales in comparison to Ruby

Ruby stands out in comparison with Java in a lot of ways. Its dynamic nature provides for a lot of interesting ways to look at programming. For someone coming in from the java world, the nice hacks that you can never do using java include

  1. Ability to Intercept messages (aka method calls) to Objects using callbacks
  2. Define a class instance for each object. We can change class defnitions associated with one instance of an object ! You can just imagine what u can do with that.
  3. Dynamic Typing - any object that supports a certain message can be used. So no messing around with casts. Code becomes more cleaner and shorter.
  4. Dynamically extend funtionality of any class meaning no class is sealed/final and we can define methods in Object class itself and all objects across can see the new method immediately.
  5. Dynamically create/remove methods/classes on the fly.
  6. Support some forms of functional programming
Anyways this list is not complete or comprehensive. It just contains a list that springs to my mind immediately.

Java purists(lovers ?) can argue that barring the language constructs things like VM optimization, non-green threads, extensive libraries that just about do anything would take a lot of time to manifest in Ruby. Both have their merits and de-merits. But for something that was built by someone in their spare time and with no big corp support (likes of Sun, IBM and Oracle) Ruby sure has come a looooong way.

Subscribe to comments for this post

Monday, May 08, 2006

Future of Java Synchronization - Escape Analysis, Lock Coarsening & FastPath

The future releases of Java has a few important synchronization related changes. The new features are

1. Fast Path

Ok..to understand what FastPath means, let us look at how synchronization works in Java. Each Object in java should support synchronization but only a tiny fraction of the objects we use would ever be used for synchronizing. So the overhead (of memory) to implement sync'ing should be very minimal. Hence all the information needed during synchronization are stored in a separate class and objects of this class would be referenced when the object is being sync'ed on.

The class that holds sync info has various fields some of which are a counter holding the number of threads blocked/waiting on this object, an OS specific semaphore object (very heavyweight object) and a counter to track nested syncing on same object by same thread.

When a thread attemps to sync on an object for the first time, the JVM sees that the object has either no sync info instance associated or that the sync info instance has counter of threads waiting as 0. Both of these mean there is no contention on this object yet. So the sync call proceeds into a fast path execution. It either creates a sync info instance and updates the address of this instance into the object(the one we're sync'ing on) using a CAS (compare and swap) instruction or updates the owner field in sync info instance to refer to the new thread.

If the sync info instance has a non-zero instance (yeah we're screwed !) the JVM blocks the thread on the OS specific semaphore. This operation is heavyweight and is called slow path. I dont know why this is heavy and i can just speculate on why its so. So let me not get into that without having more info.

There are two ways of implementing this blocking...using either a infinite loop checking if the object is freed (spin locking) or using the bad OS semaphore to do the work. The first one is CPU intensive and can only be used if the locks are held for very short durations and second can be used anytime. Apparently the JVM will either do an infinte loop for short sync operations. Also the Compare and Swap instruction itself is being replaced by something more efficient. Dunno what though.

All these things help making the fast path locking go thru even faster and attempt to implement spinning to ensure that semaphores are used as minimally as possible ensuring the sync operation itself becomes fast.

2. Escape Analysis to help Lock elision

If an object being locked by a thread can never be accessed by another thread it means each of the synchronizations will always occur on a new object. In such cases we can eliminate the synchronization itself leading to the code being faster since the memory will not have be flushed, the lock will not have to be checked and created. This helps in faster execution times.

3. Lock Coarsening

Consider the following code

private String getMessag()
{
StringBuffer sb = new StringBuffer();
sb.append("line one");
sb.append("line two");
sb.append("line three");
}

Each of the StringBuffer.append instructions would involve locking on the string buffer itself. But we can see three calls to this method meaning we would do the following operations thrice


1. Flush of memory to main memory
2. Acquire Lock on String Buffer
3. Flush of memory to main memory


Since the above function can be re-interpreted as

private String getMessag()
{
StringBuffer sb = new StringBuffer();
synchronized(sb)
{
sb.append("line one");
sb.append("line two");
sb.append("line three");
}
}

we can cut down the number of sync calls to only one. This is called lock coarsening since we effectively coarsen the lock.

Some of these 3 features are in Mustang (Java 6) while most are in Dolphin (Java 7). The mustang already does some escape analysis but apparently the info is not used to do lock elisions and only some simple coarsenings are supported currently. So expect to wait longer till ur code can run fast. To know more on this u can refer to Davids blog entry.

Subscribe to comments for this post

Thursday, April 20, 2006

First Experience with Ruby

It seems Ruby is becoming omnipresent these days. A lot of hype, hope and buzz seems to surround it. A few of my friends moved into ruby and went ooh-aah over the new fanged toy that does wonders. But when i tried to get them tell me what was so intersting and powerful, the common answer i got was that it was so simple to write code. Sufficiently piqued i decided to get some first hand experience.

So i borrowed a PragmaticProgrammer book on Ruby (online version can be found here) and started reading thru it. It is a good book with the only grouse being some of the topics were not covered in depth as i would have preferred.

Anyways after poring through half the book the list of features that spring out are

1. Fully OO

Everything seems to be a method connected to an object including new, loops etc. For e.g. to create a new instance of my object you do MyObject.new(). Numbers (java equivalent of int's) are all instances of FixedNum class so you can do things like 3.times, 3.step(30, 3). Both the 3.X methods are looping structures with the first one looping 3 times and the second behaving like a java for loop.

2. Iterators and Blocks

Ruby defines a bunch of iterators that operate over Ruby containers (Arrays, lists and hashes). Coming from the java world, to me an iterator just meant a way to loop through the contents of a collection. But Ruby iterator is a different beast that no only loops through the contents but also executes a 'Block' of code for each object in the collection. To make it seem less daunting, lets look at an example..

[ 1,2,3,4,5 ].each {|i| print i } This would print 1 2 3 4 5 on the console.

In Java, we'd have to do..

int numArr[] = {1,2,3,4,5}
for ( int i =0; i < style="font-weight: bold;">3. Closures


A closure is a block of code that retains the values of the instance/local variables it accesses or uses. So we can create a closure, which accesses a few local variables defined and set above it. This closure when passed to a method in a different class will retain the local variable values as it was in the original context.

Martin Fowler has a very good article on closures. Read it to know abt the power of closures.

4. Assignments and Operator Overloading

Assignments can be chained since each assignment returns that value as the result of the assignment expression. So doing a = (b = 1 + 2) + 3 will set a to 6 and b to 3. Also both if and case return the value of the last expression executed. so we can do i = if (num <>

Also operator overloading is supported. Damn java seems to be the only one not supporting it these days.

5. Getter/Setter

We can have getters/setters for object variables without writing methods. In ruby classes attr_reader and attr_writer followed by variable names will make the attributes readable/writeable.

6. Mixins

Ruby like Java supports only single inheritance. Mixin is a powerful concept where you define multiple modules and when u include them in your class all the methods in the mixins are accessible in your class. This mixin does not copy the code into the local class just references it, so any change to the module methods/variables in any class will affect globally.

For e.g. ruby comes with a standard module Comparable which defines comparision operators. For comparable to work we have to have implemented the method <=> in our class which the module uses. This is similar to an abstract base class in java but only we are allowed to have multiple abstract base classes for a single class. Pretty powerful !!

So Ruby has some features i like (all the ones above) but its arrays and collections are not strongly typed and i have a general aversion to such things having some bad memories from a VB/ASP project long ago.

Overall i have to think about some real life problems that ruby makes it easy to solve and actually scales well before actually going nuts over it.

Subscribe to comments for this post

Wednesday, April 19, 2006

10 Tips for good API Design

All along my life, i had complete control over my code. I could change method/class names, signatures at will and all users would be in the same code base. I could do endless refactoring. Now I have to maintain and keep enhancing an API. Sounds easy ? Its actually big trouble, every method you publish would end up being called by thousands of people and you lose your ability to refactor in a jiffy.

Public API's are forever and there is only one chance to get them right - Joshua Bloch

I learnt good API design is an art in itself and i also picked up some basic rules that should be followed. They are

1.Intuitive APIs

When designing an API, always consider specific examples of what code clients should write to use it. Model api's after commonly used usage patterns/api's. This will help us avoid cumbersome or unintuitive APIs.

2.Internal Code

All internal code that should not be called by public should go in a separate package marked internal. Like all public classes in com.company.product.* and internal classes in com.company.product.internal.*.

3.Expose only what's needed

Use the most stringent access specifiers possible (private/package provate/protected/public). The idea is to prevent unintended usage of the api's. Some common tips
1) All classes/functions that should be called by other classes in the same package should be declared as package private. Note there is no such thing as package private interfaces since all methods in a package private interface have to be declared public.
2) If a method needs to be called by classes in a different package then the tendency is to make it public even though it is not a true public api. In such cases, a good way to proceed is to make the class as abstract, then subclass this class in the private packages and use static factory methods to return an instance of the internal class. This way the methods would be exposed in private packages only.
3) Declaring a method as protected is as good as declaring it as public since any one can subclass and be able to call the method. It should be used only when it is sure that clients should be able to override them or subclasses are in different packages.
4) Make classes/methods final such that they cannot be overridden. Turning it around, assuming clients will extend any of the public classes which are not final it is unpredicatable to let clients overide selected methods without understanding the big picture.
5) No member variables should be public except constants (public static final)

4.Javadoc

All public methods/constants in a public class must be documented. The ease of use of your api depends on how good the javadoc is. A good clean javadoc means there is no implementation detail specified. All operations on data that are visibile to clients is well documented as are error conditions. Specify clearly what each method will and more importantly will not do.

5.Static Factory creation

Prefer using Static factory methods to create objects than using new. This allows us to create any object that implements the specified contract than exposing a truckload of classes to the public.
Use of a constructor forces the implementation to return an instance of the class itself rather than a subclass (or one of a set of subclasses). This would force the class to have knowledge of all variant behaviours.

6.Contexts

Dont store contexts in objects. Prefer to pass them as method parameters. If we store contexts in objects and pool those objects, then we have to ensure that the context is valid throughout object lifecycle. This is a big pain.

7.Thread safe classes

Immutable classes and fly weights are easily thread safe. Prefer to use them judiciously.

8.Helpers & Utils

Any util class that needs to be associated to a state should be made a helper that has to be instantiated with the state. Utils should be final and non instantiable and all methods should be static.

9.Class Names and Method names

Make names consistent throughout the api, length() method returns length of both String/StringBuffer. If you get the naming correct most people would be able to use it intuitively without poring over the docs. This makes the api's very simple to use.

10. Exceptions

Never throw a single type of exception with different messages. Use lots of different exception classes with the rule being roughly for each kind of error throw an exception. Be sure to put them in proper hierarchy like java.io.IOException so that clients have the flexibility to declare a single block for a general category of exceptions.
Always ensure that the exception message is picked up from a resource bundle or some other file so that messages can be easily translated. Hey, you never know who is going to use your api's !

When in doubt leave it out. We can always go in and add something later after more delibration.

Subscribe to comments for this post

Monday, February 27, 2006

Why a ConcurrentHashMap is so fast

We all know a ConcurrentHashMap is nearly as fast as a HashMap plus provided concurrency like a Hashtable. Some very ingenious coding has made this possible. To understand the hows and whys the key is the structure of the ConcurrentHashMap itself.

A ConcurrentHashMap contains a final array of Segment objects. Each segment extends ReentrantLock, and contains a transient volatile array of HashEntry objects. Each HashEntry object is made of final key, hash and next variables and a volatile content variable.

Every get/put/remove/add operation involves creating an index from the hashkey that maps to one of the available segment objects. The call is then delegated to the obtained segment instance.

Let take put call. The segment first locks itself (since extends from ReentrantLock). Then proceeds to check if the hashkey already exists, if it does, it updates the value of the volatile value field in the HashEntry object. Since it is a volatile variable, the new value is guaranteed to be seen by other threads in the jvm without any explicit need for synchronization. Voila ! Now if the key is not present then a new HashEntry object is created and added to the head of the existing list. Since there are an array of Segments, the writes can be spread and not all threads might lock on the same segment leading to more concurrent writes.

Consider a get call. Get the first HashEntry and iterator till the end and if found return the value. No locking at all. Same as a HashMap but unlike a Hashtable. So all reads work at nearly same speed as HashMap.

There are 2 things that make this possible

1) The new JMM guarantees that Volatile reads are not re-ordered with volatile writes and all reads after a write will get the updated contents without synchronization. Variable that holds value reference in HashEntry is volatile. Plus the whole HashEntry array in each segment is volatile. So any changes to value or every newly added HashEntry object(or key-value pair) is visible to all threads after they are assigned without any syncronization.


2) Final fields initialization safety, all threads will see the values for its final fields that were set in its constructor.Further, any variables that can be reached through a final field of a properly constructed object, such as fields of an object referenced by a final field, are also guaranteed to be visible to other threads as well. So if a new key-value pair is added and is instantly accessed by another thread the key, hashkey and next pointer will have proper values and never null.


These ensure that any add in any thread instantly reflects in other threads, without any flushing of memory. Does this mean no locking at all ? None in the java code but the jvm implementation will have to do some locking to ensure that volatile variable reads return latest written values. Since its a very lower level it should be more faster.

Never seen an API that uses the features of the JMM to this extent. Hats off to Doug Lea who made this all possible.

Subscribe to comments for this post

Tuesday, January 10, 2006

Time based UUID Generation Algorithm

We had a requirement recently that we should map files to UUIDs. This gives us the flexibility to refer to a file without using a name thereby enabling us to rename it. So we dug a lil bit on UUIDs. java.util.UUID is the UUID implementation in java and this is the RFC its linked to.

So basically a UUID (java.util.UUID) represents a 128-bit value. These bits are split as


32 bits time_low
16 bits time_mid
16 bits time_hi_and_version
16 bits clock sequence
48 bits node

Timestamp is a 60 bit value of the UTC as a count of 100-nanosecond intervals since 00:00:00.00, 15 October 1582.

Clock Sequence is used to help avoid duplicates that could arise when the clock is set backwards in time or if the node ID changes.The clock sequence MUST be originally (i.e., once in the lifetime of a system) initialized to a random number to minimize the correlation across systems. If the previous value of the clock sequence is known, it can just be incremented; otherwise it should be set to a random or high-quality pseudo-random value.

Node For UUID version 1, the node field consists of an IEEE 802 MAC address, usually the host address.For UUID version 3 or 5, the node field is a 48-bit value constructed from a name. For UUID version 4, the node field is a randomly or pseudo-randomly generated 48-bit value.

There are four different basic types of UUIDs: time-based, DCE security, name-based, and randomly generated UUIDs. These types have a version value of 1, 2, 3 and 4, respectively. Lets look at the time based UUID generation algo.

Time based UUID creation Algorithm

These are the steps as present in the RFC. All italics are my comments..

1) Obtain a system-wide global lock - How ? Simple use a java.nio.channels.FileLock ! This is what the java.util.logging framework uses to ensure log entries are not overwritten when used from multiple JVMs.
2) From a system-wide shared stable store (e.g., a file), read the UUID generator state: the values of the timestamp, clock sequence, and node ID used to generate the last UUID.
3) Get the current time as a 60-bit count of 100-nanosecond intervals since 00:00:00.00, 15 October 1582.
4) Get the current node ID.
5) If the state was unavailable (e.g., non-existent or corrupted), or the saved node ID is different than the current node ID, generate a random clock sequence value - If someone deletes the state store file, then since we start off with a random number we can still get a unique UUID.
6) If the state was available, but the saved timestamp is later than the current timestamp, increment the clock sequence value. - Ingenious !! This means if u revert back ur clock the UUID will still remain unique.
7) Save the state (current timestamp, clock sequence, and node ID) back to the stable store.
8) Release the global lock.
9) Format a UUID from the current timestamp, clock sequence, and node ID values.

The algorithm looks very simple and elegant, and though there are other ways of getting UUIDs this is the one that is easy to understand.

Subscribe to comments for this post

Friday, January 06, 2006

Effective Developer Productivity

Just came across this article on productive coding and moving forward

The gist of the article is all about how productive a developer is each day. Everyone of us knows we dont code for 8 hrs a day. When we are in the mood (or in the zone) we can get close to 5 hours of productive coding. And when we're not, ah well, let's admit it we generate a lot of internet traffic, go write some documents .. you get the idea !

And when we see that we're behind on the plan and are going to be in trouble pretty soon, we create a big imaginary boot to kick us out of the stupor and get back to work.

This piece of truth generates a lot of insights

1) Pair Programming - Each pair would generate more quality code per day than what they would have done individually. Why? Coz if one of the developer enters his zone and starts programming, he enthuses the other enuf to get him productive. Left alone the second developer would be well twiddling his thums, picking his nose and waiting for something interesting (like evening !) to happen.

2) Task Swaps - When I am in the zone, working furiously, and i get a small task to do in addition to what im doing, i lose it. Face it, you're in the zone, your manager walks in and asks you to monitor the build every hour or create some quality (CMM ??) documentation before the end of the day, and that task is enuf to break me out of the zone. Period. So Ensure when you're in the zone to a) disable mail alerts b) close all IM's c) not respond to any requests for squeezing in one teeny weeny bit of work into ur day.

3) Getting into the zone - Many a time, i've come to office and did nothing productive for the whole day and left at the end of the day. How do you ensure that you can get in atleast a few hrs of produtive work every day just to make sure you can satisfy the daily plan tracking manic manager ? Getting started on something is the hard part. Start on something small and easy so that a) on completing it you feel better b) more importantly, if u try anything big/hard in which you cant get something to work in the first half-hour, will end up getting u further away from the zone. Have a lot of small wins and a few hours later, you can bring on your big guns and start blazing away at those nasty bugs/functionalities.


Happy Coding !

Subscribe to comments for this post

 
Clicky Web Analytics