Random Thots

6 months of DDD !

2009-02-17T21:09:00.003+05:30

We have been learning and trying to use DDD in our project for the last 6 months. Meanwhile I got involved in a couple of other projects involving lucene, axis etc and had not looked at this code for quite some time.

I finally pulled in the sources to see how DDD had turned out and also find out if it was indeed helping in increasing the expressibility of code.

Some of the things that jumped out immediately were

The request DTOs had seeped into the domain classes with certain domain methods that are called from service layer, accepting the request objects as inputs.
Conditional checks (and logic) spread out. For e.g. a series of if-else checks with certain domain logic happening in each block.
Large domain classes (in terms of LOC)
All logic mainly in entities. No other kinds of domain objects (Value Objects, Services or specifications) were to be found.
Mostly all entities have a shallow hierarchy structure. I am being generous here more than 90% of entities had no hierarchy.

We would now have to see how best to clean this up. Will keep you posted on how it goes.

Technorati Tags: DDD, Domain Driven Design

Domain Driven Design without AOP

2008-06-16T18:44:00.008+05:30

Ramnivas Laddad, in this presentation during SpringOne, makes the assertion that "Domain Driven Design cannot be adequately implemented without help of AOP and DI".

I don't quite agree with this. I believe you can get by fine and do good DDD without using AOP.

According to the talk, AOP can be roughly used for the following purposes

Dependency Injection in Domain entities and vo's
Handling domain logic
Handle Cross Cutting concerns

Of these, IMHO the only 'real' need for AOP is to handle cross cutting concerns like transaction management, auditing etc.

Dependency Injection without AOP

DI'ing services/policies/repositories into an Entity/VO can be done without using AOP in other simpler ways like

Using callbacks methods like onLoad in Hibernate's Interceptor, which contains the injection logic based on the entity type. This is a slightly intrusive solution though.

A better way is to have interfaces for each service/repository that is to be injected and making the entity implement the required interfaces.


public interface OrderRepositoryAware {
     public void setRepository(OrderRepository orderRepo);
}

public interface CreditRatingAware {
     public void setCreditRatingSvc(CreditRatingSvc svc)
}

public class Order implements OrderRepositoryAware,CreditRatingAware { ... }

public class DependencyInjectingInterceptor extends EmptyInterceptor {

     public boolean onLoad(Object entity,
        Serializable id,
        Object[] state,
        String[] propertyNames,
        Type[] types) {

          if(entity instanceof OrderRepositoryAware) {
               ((OrderRepositoryAware) entity).setRepository(...);
          }
      ......

  }

However the same logic needs to be replicated in Factory objects which creates new Entities. This is the drawback of this issue.

Use a ServiceRegistry to get a reference to required service/repository etc. To unit test you just need to stub out ServiceRegistry.

public class Order { public void validatePayments() { ServiceRegistry.getRepository().getPaymentHistory(); } }

The singleton initialization would have to change to create mocks when used in testing.

Domain Logic without AOP

In his presentation Ramnivas takes the example of a PaymentAuthorization service failure.

If PaymentAuthorization fails, try authorizing using other service providers.
In case all providers fail, do a temp auth based on payment history. When product is about to be shipped secure payment.

The example uses an around advice for PaymentProcessor.process which has retry logic. To do temp auth an Inter-type declaration is used.

In my mind, the clarity of code is vastly reduced by using AOP here. This domain logic that is weaved, is not be apparent and visible when looking at Order class. The domain class Order, is incomplete without considering the two aspects. This breaks the clarity of the code and model that is offered by DDD. There is no apparent gain using AOP here.

In this example, both aspects are core domain logic and in on real world model some object would have ownership of these bits of functionality. Either Order or a Domain Serice like OrderProcessSvc, would be the best objects to own this logic. Choosing between entity and service to host this logic, depends on whether the Order should own PaymentAuthorization (which I think it should) or not. But main thing is to encapsulate this logic into its rightful owner.

To me aspects in DDD can be used for things like enforcing rules (augmenting java's limited support for scoping - use an aspect that throws a compilation error whenever an entity is constructed using 'new' operator), handling service failures/issues etc at boundary with infrastructure services.

I believe Domain Logic should only be contained in POJOs to maintain clarity and readability. Any thoughts?

Technorati Tags: DDD, Domain Driven Design

Associations in Domain Driven Design (DDD)

2008-06-07T22:02:00.003+05:30

This post is part of a series I am writing on Domain Driven Design (DDD).

Associations between objects are typically not well thought out during design. Most of the time, the associations are dictated by the underlying data model. This results in addition of a lot of associations and traversal paths that may not make sense in a domain.

For e.g. an Order has an associations to multiple Products. Your natural traversal would be from an Order to Product. I am yet to see a usecase where you pick a product and 'traverse' to a list of Orders to which it is associated to. Normally the kind of functionality supported would be to list 'all open orders for product' or 'all orders for this product from a given customer' etc and these typically would be queried from DB using a DAO.

According to Eric Evans, associations should

Impose a traversal direction
Add qualifiers and reduce multiplicity
Not be present if it is non-essential

Moving from a bi-directional association to a uni-directional association is one of my favourites since its generally easy to accomplish and offers biggest 'bang for buck' so to say. The other two are quite simple enough.

Unidirectional vs Bidirectional associations

A bidirectional association means that both Objects can only be understood together. There are real world usecases where such a necessity exists. For e.g. an Order needs to have a bi-directional association with its Settlement object. However in many cases like the first example a uni-directional association would suffice.

The advantages of unidirectional associations are

It brings out the natural bias of the domain to one particular traversal direction
Makes the association more communicative
Provides a natural constraint on where to place domain logic since you can only place logic on the class which can reach out to all classes involved.

When to prefer Bi-directional over Uni-directional

However there were some scenarios where I started using unidirectional associations but in the end wound up reverting them to bi-directional since it felt more natural.

Use Uni-directional associations if your usecases dictate there are multiple entry points for same operation.

For e.g. an OrderList was initially modelled with a uni-directional association to list of Orders. Any domain logic that affects multiple orders can only be triggered from the OrderList.

One of the usecases detailed that cancelling the last Order in a OrderList, cancels the entire list. Another usecase said that cancelling an OrderList should in turn call cancel on each Order within the list.

If we had bi-directional associations, supporting the above 2 usecases led to some circular call issues.Order.cancel() calls OrderList.cancel() which internally would end up calling Order.cancel for all Orders ! One solution was to have Order.cancel and Order.doCancel where one would not trigger call to OrderList. This is plain ugly.

The best solution is to make all cancel Order calls be handled by the OrderList. So let Order.cancel() delegate the cancel call to OrderList.cancelOrder(OrderToCancel).

Since the Order does not have a back pointer to its OrderList, the Order had to use a repository to look up its OrderList. Moving to bi-directional we could use Hibernate's hydration support to get a reference to OrderList from Order and vice versa.

Use Uni-directional association if there are usecases which require data from other object even though entry point is a different Object.

The problem is compounded when having *-to-many relationships and many is actually quite a large number which causes a resource drag during Object hydration.

For e.g. over course of a year, a single Customer can have thousands of Orders. Loading all Orders every time a customer object is created is bad for performance. But you would obviously need Customer data to fulfil a given Order and all Order data for a given customer to find credit worthiness.

In such cases maintain a bi-directional association indirectly by having Customer.getOrders load Order data, on demand by querying from database.

Associations when modelled correctly ensure that there is always only one way to access data. This enhances readability and as a consequence maintainability of code.

Technorati Tags: DDD, Domain Driven Design

Alan Kay's Metaphor for OO

2008-06-03T08:26:00.010+05:30

Alan Kay, one of the creators of SmallTalk has this to say about Object Oriented Programming

The most obvious parallel is the human body, which is divided into
trillions of cells, each performing its own specialized task.

Like objects in software produced with object-oriented programming,
human cells do not know what goes on inside one another, but they can
communicate nevertheless, working together to perform more complex
tasks. "This is an almost foolproof way of operating", Kay says.

By mimicking biology in this way, we can minimize many of the problems
inherent to the construction of a complex computing system. A developer
can focus on one simple module at a time, making sure it works
properly, and move on to the next. Not only is building a system this
way easier, but the system will be much more reliable. And when it does
break down, it is simpler to fix, because problems are typical
contained within individual modules, which can be repaired and replaced
quickly. By contrast, a monolithic system is like a massive mechanical
clock containing innumerable turning gears, none of which has its own
internal logic or communicates information. Each gear functions
unintelligently and only in relation to other gears. The design is
hopelessly flawed. When building gear clocks, eventually it
will reach a certain level of complexity and it falls in on itself.

Great crisp description. No cludgy words like blacboxes, blueprints etc. It clearly highlights the pitfalls in a procedural style of development.

I like the way in which he weaves in the concepts of Encapsulation(Info Hiding) and Message Passing.

Encapsulation is drilled into everyone doing OO. But most of the developers have still not got the subtle difference between message passing and method invocation. Can't say I blame anyone (i too did'nt know till recently), since you dont need to know about something they can't see/use when using java, c++ etc.

A message is a signal from one object to another that requests the receiving object to carry out one of its methods. The message consists of the name of the receiving object and its arguments, that is, the method it is to carry out and any parameters the method may require to fulfill its charge. Hence the end state of a message is "method invocation".

Ideally every object will get a message, check if it can respond to this message. If it can, the appropriate method for this message is invoked. Else the object can respond in whatever way it seems fit. When using Java this is not evident and not exposed at all. In Ruby atleast you have some access to this using the method_missing feature.

I digress. Coming back, I still remember my first OO concepts course, and the convuluted/over-simplified examples that were presented. I just wish I had come across stuff like this at that time.

Domain Services in Domain Driven Design (DDD)

2008-05-31T14:02:00.006+05:30

This post is part of a series I am writing on Domain Driven Design (DDD).

"A SERVICE is an operation offered as an interface that stands alone in the model, without encapsulating state, as ENTITIES and VALUE OBJECTS do." [Evans 2003]

The above defines what a Domain Service is. Any behaviour, operation or activity that cannot logically fit into an Object is a good candidate for being exposed as a service. However the important thing to note here is that the Service being exposed should represent a domain concept. This ensures you don't get carried away and create tons of services with very less entities and vo's.

Rules to be followed by Domain Services

A domain service has to abide by only 2 cardinal principles

It has to be stateless
It should have some meaning in domain e.g. AccountTransferService, AdjustOrderService etc

Identifying Domain Services

I carve out a service in any one of the following conditions

The concept behind an entity/vo does not lend itself to the activity being modelled. If by adding the activity concerned the meaning of the entity/vo is diluted, the activity is better off exposed as a service. In other words, if there is no obvious owner expose behaviour as a service.

When the activity being modelled spans across different entities/vo's that are not part of same aggregate, expose a domain service that co-ordinates the activity across the many objects involved.

For e.g. when adjusting or modifying an existing order, the order, payment and delivery entities have to be updated. I find it more easier and intuitive to model this as a service rather than have all logic in Order.adjust method. To access and modify Delivery/Payment entities, Order entity may have to do a deep object graph traversal. The order entity may not always be qualified to handle this traversal. The OrderAdjustService however, would use a repository to look up all concerned entities and co-ordinate the adjust method among them. This ensures all domain logic of how adjusts happen is still contained within the entities and the service just acts as an orchestrator.

In fact I like to have an Order.adjust method that the UI/Application layer calls. This Order.adjust method would then call the OrderAdjustService passing in required info. This provides a clean interface and separation of concerns.

If a certain domain logic is applicable across multiple entities and you have to work with a single inheritance model (using java for example), then the logic can be exposed as a service.

For e.g. multiple entities like Order, Shipment etc need to expose MarketPrice behaviour. The logic to create a price includes looking up real time market prices and currency conversion rates and then calculating the price. A MarketPrice service can encapsulates this logic. The entities should be able to refer the MarketPrice service and delegate all MarketPrice related calls to that service.

I use this when a ValueObject cannot suffice. In this case there were no fields specific to market price in entities and these were runtime generated and consumed values.

There is a good description of this technique here. However this is not commonly used.

FAQs on Services

Are'nt Services bad and should'nt we use all objects as per OO ?

Yes, Services tend to stand orthogonal to Object Oriented Design. Services are not always bad since its far worse to force fit a behaviour into some entity/vo just to be more OO compliant. It messes up the class by distorting its conceptual clarity and makes it harder to understand.

There is a huge tendency in the modelling world to use excessive number of services. It's easy to stop fitting behaviour to appropriate class and instead stick them into meaningless services. This is when services become bad.

What is an application service and how is it different from domain service ?

An application service layer defines a set of services that act as boundary to the domain layer. Any interaction with the domain layer passes through these application services. The application services interface with domain and infrastructure layers to get the job done. The domain layer also can talk to infrastructure layer.

Some excellent point about the distinction between the two can be found here and here.

Can an application service directly talk to a domain service ?

It depends on your preference on whether you want an explicit entity to own the operation or not.

However one thing to be clear is that application service layer and domain service layer cannot be combined into one layer. It has been covered with good detail and explanations here and I will not go into them.

My first take on this was that there is no reason for an application service to talk to domain service. However a colleague of mine had the view that if one service is going to call the other, then application service was not adding much value and it could pick up some of the work that domain service was doing.

For e.g. our previous OrderAdjustService looks up a bunch of entities and calls appropriate methods on them. We got into an argument that if, this was all it did and had no special logic of its own then it may as well become an application service. However the methods being invoked in all cases were not exposed to application layer. So it was in this context we created a Order.adjust() method that delegated the call to OrderAdjust Service. Having Order.adjust() has an advantage that it makes Order own the adjust operation. If you are not particular about it, you could call OrderAdjustService from application service layer.

Domain services are an important part of Domain Driven Design since they give both flexibility and clarity to the model. As with all good things, use it with moderation !

Technorati Tags: DDD, Domain Driven Design

Value Objects in DDD - Part 2 - Creating VO's using Hibernate

2008-05-18T10:27:00.023+05:30

This post is part of a series I am writing on Domain Driven Design (DDD).

In my previous post on Value Objects I had mentioned that Value Objects can contain references to other entities and value objects.

I will show how to use Hibernate's features to define and use Value Objects. For illustration purposes I have taken a simple VO. If you look at any price label on any product, there are 2 consituents to it - a numerical price part and a alphabetical currency part. When paying for product, you end up multiplying the numerical part with the conversion rate of currency you pay to cashier.

Building from this example, I have an Order entity which has a Price value object. The price value object contains a reference to the Currency entity.

Order contains Price; Price refers a Currency

Order.java looks like

public class Order {

    private long orderId;
    private String name;
    private long quantity;
    private Price price; //This is a reference to a ValueObject

    public Order(){}

    public Order(String name, long quantity, Price price) {
        this.name = name;
        this.quantity = quantity;
        this.price = price;
    }

    public double getUSDValue() {
        return price.getUSDAmount();
    }
}

Definition of Price Value Object

Price is a value object. Notice the lack of any apparent identity columns (Primary keys).

public class Price {
    private long amount;
    private Currency amountCurrency; //This is a reference to an Entity

    public Price() {}
    public Price(long amount, Currency amtCurrency) {
        this.amount = amount;
        this.amountCurrency = amtCurrency;
    }

    public double getUSDAmount() {
        return this.amount * this.amountCurrency.getUsdConvRate();
    }
}

Mapping ValueObjects in hbm files

All details of how to make Hibernate treat price as a value object is added to the hbm mapping files.

<hibernate-mapping schema="shop" default-access="field">

    <class name="org.dddtest.entities.Order" table="ORDERS">
        <id name="orderId" column="order_id">
            <generator class="increment" />
        </id>

        <property name="name"><column name="NAME" /></property>
        <property name="quantity"><column name="QTY" /></property>

        <component name="price" class="org.dddtest.entities.Price">
            <property name="amount"><column name="amount" /></property>
            <many-to-one name="amountCurrency" column="curr_id"
                class="org.dddtest.entities.Currency" not-null="true" />
        </component>

    </class>
</hibernate-mapping>

From the above mapping file the two things of interest are

default-access="field"
Component tag

The field access alleviates the need to expose getters/setters on the entities thus ensuring the class is well encapsulated. This was dealt in the post on entities.

A component is an object that is treated as a value type and not as an entity. This means that the fields of the component/value class would be persisted as part of some entity. In our case, fields of Price were persisted in Order entity/table since the Order owns all values in Price. Hibernate when creating the Order, creates a Price object and maps the values to it. This price is set into the order.

A nice feature with Hibernate components is that the same Price VO can be used with any other entity as well, since the owning entity (Order, XYZ etc) can always override mappings for Price fields. Also one component can contain another component within itself !

Using Components, any combination like Entity -> VO -> Entity or Entity -> VO -> VO etc can be created. This gives us powerful abstraction abilities to keep grouping fields into powerful domain objects and not be bound/limited by data model.

Technorati Tags: DDD, Domain Driven Design, Hibernate

Value Objects in Domain Driven Design

2008-05-15T00:01:00.008+05:30

This post is part of a series I am writing on Domain Driven Design (DDD).

"An object that represents a descriptive aspect of the domain with no conceptual identity is called a VALUE OBJECT." [Evans 2003]

Basically it means that a Value Object describes the state or characteristics of an Entity, but it has no identity of its own.

Creating Value Objects

Carving out value objects from entities is quite an art. Those fields of an entity, whose values are what is important, can be moved into separate objects from entity. Value Objects can contain references to other entities and value objects.

When is a class considered as an entity and when as a value object ? This discussion explains it better than anything i have read elsewhere.

If you had looked at my previous post on entities, I had said that the entity should be stripped to its bare minimum fields. So if you were wondering what happens to the rest of the fields the answer is that, they move into value objects.

To validate if a VO created is good. Try swapping one value object with another, containing same values, and if the entity does not care and works as usual, that is sign of a good value object.

Create VO by grouping together related fields

Group related fields of an entity together. Related fields are those where

A field change, triggers changes in multiple other fields (or)
A Group of fields have a collective/shared meaning

Identify all behaviour which is based around these fields
Move the related fields and its behaviour into its own class

Always ensure that the class you created has some meaning in the domain. Never randomly group fields together for sake of creating a value object.

Create VO for fields with format restrictions

Identify fields in an entity, which have restrictions on values it can hold. E.g price field can be a BigDecimal with restriction that it should have 4 digit accuracy after decimal point or zipcode field has to be of a specific format
Move such fields into separate classes with methods that enforce the restriction.
Add methods that add to concept represented by class. For e.g we can add a currency code to price and the price object can convert its value from the default currency to USD/EUR/INR etc.

This came in quite handy for us. We used JPA with hibernate and one day we decided to make from all prices which were long's to BigDecimal's. Since Price was its own class, changes were limited to Price class alone.

Create VO for business method arguments/return types

I picked up this tip from this video on domain value objects. The presenter says that a good starting point to identify new VO's would be to look at args and return types of domain methods. The argument here is that in most cases they are not just DTO's/some primitive type, but they represent something in the domain.

Tip: Create VO's when you define the entities. If you find that a particular VO is not adding much value, you could always fold it back into the owning entity. Never defer defining entities, since its a vey manual and painful effort. (Imagine changing all annotations, methods etc to use new VO).

Fine-Tuning the VO

Some of the interesting scenarios I faced when defining VO's

1) How to handle when, multiple entities define and use a common set of fields with some field variations ?

In the example scenario defined here, all entities, order, box and container have quantity related fields. All three have open and filled quantities, but the Container object has a maximum allowed quantity and threshold quantity. So a hierarchy of VO's was created

QuantityVO <------ LimitQuantityVO. All common fields went into the base class - QuantityVO and rest of fields went into LimitQuantityVO.

2) Do all value objects have to be immutable?

If VO's are mutable, it would make it harder to understand/control changes to the VO. It also means the VO 's methods cannot change its own state. So any mutation method on VO has to return an instance of itself with updated values and the caller needs to know how to set this VO back to the owning entity. I don't like the idea of the caller, tracking if a state change occurred or not, and then calling the corresponding update on the entity.

If your VO's rarely change, make them immutable. But making them mutable would save you lot of trouble when coding. So its elegancy vs practicality. I have done both and don't have a strong preference for one over the other. Make your choice based on which tradeoffs you can live with.

Part-2 of this post talks about how to implement VO's using java and hibernate and can be found here.

Technorati Tags: DDD, Domain Driven Design

Layers in DDD

2008-05-11T21:05:00.004+05:30

This post is part of a series I am writing on Domain Driven Design (DDD).

Domain Driven Design promotes a layered design with the following layers

User Interface (Presentation Layer) : Displays the information to the user and responds to user commands
Application Layer: Defines the services provided by the application and directs the user commands to the domain layer that actually does the work. This layer does not contain any business rules. However its not simply a facade since it can be responsible for some high level orchestration. This orchestration adds value to this layer.
Domain Layer: All the business processes/rules in the problem domain is contained here.
Infrastructure and Technical Services Layer: Object persistence, messaging and other such technical services needed by the above layers. Will not contain any business logic but will be home to all framework code etc.

Entities in Domain Driven Design (DDD)

2008-05-11T21:00:00.016+05:30

This post is part of a series I am writing on Domain Driven Design (DDD).

"An Entity is an Object that represents something with continuity and identity. An entity is tracked through different states and implementations."

This is the definition of an entity according to DDD. So roughly, an entity is any object that is persisted in the database. Any class that is not stored in DB is not an entity.

IMHO an entity is the single most important constituent of DDD, simply because a vast majority of business logic would be owned by the entity. I use the term 'own' broadly to encompass all logic that is triggered by an entity but not necessarily present in the entity.

Creating an Entity

A good entity has to be both, lean and powerful. A normal entity typically has a number of attributes. To make this entity lean and powerful,

Strip the entity to contain only its identifier attributes.
Then add methods which are core to the concept/idea represented by that entity.
Now add only attributes needed by these core set of methods.

Note: All other attributes can either be other entities or value objects. When such smaller entities are made out of a larger one, a tree of entities with one root entity governing the lifecycles of all other entities, is obtained. This is called an aggregate.

This is all that there is to an entity.

Sample Scenario for discussion

The following example will be used for discussion.

A Container contains multiple boxes, each of which contain the same product. A CustomerOrder contains multiple boxes, one of each kind of product.

Customer order (1) ---------> (n) Box
Container (1) ---------> (n) Box

So a customer order indirectly refers to (n) Containers where n is the number of distinct products in the order.

Users are allowed to add/remove a new box to/from the order. Duplicate boxes for same customer order are merged.

Fine-tuning the Entity

To fine-tune the raw entity we have, we add in some more rules. These rules were based upon commonly encountered issues we faced. The main issues/rules are summarized below

1) Exposing Getters/Setters

Exposing attributes publicly, leads to a loss of encapsulation. No other class should be allowed to change the state of an entity, other than that entity itself. Other entities/classes can only trigger a business method and during execution of that business method,the entity itself mutates its state.

Bottom line: Never expose getters/setters in the entities.

Tip: If you use hibernate use field access to avoid setters.

2) Can Entities be exposed to application services ?

Application Layer sits above the domain layer. UI talks to this layer which orchestrates the call with classes in the domain layer. The straightforward answer is yes, the entities have to be exposed to application services so that they can be called when appropriate events are triggered in UI.

However the tricky part is whether the application services has access to all methods in the entity. In my experience, every entity has a bunch of granular methods that are exposed to other entities so that a complex workflow can be built from them. In the example above, removing a box from a container is a method exposed on the container. Typically this will be called by Order when a product is removed from it. But is Container.removeBox a method that needs to be exposed to application tier ? No.

Order needs access to this method but OrderApplicationService does not. When using java, a couple of alternatives possible are

Make method as package private provided Order and Container are in same package.
Have an interface over Container and let OrderApplicationService use only methods specified in the interface.

I prefer using alternative 2, since then, this method can be used throughout my domain layer but it wont be accessible outside of it. This also alleviates the needs to dump all entities into same package.

Bottom line: Entities have to be exposed but not all of its methods have to be.

Tip: Expose only relevant methods from entity to Application service layer

3) If an Entity's core method needs access to data in Db, can it hold a reference to DAOs ?

Yes. There is no harm in an entity using DAO's to look up data from DB. If a subset of some large set of data is to be updated, looking up data to modify is more performant than loading the entire set using the ORMs hydration and then iterating through the entire set.

This question has been discussed in great detail in the DDD forums and can be found here, here and here.

Tip: If using Spring 2.x use @Configurable notation to inject DAO reference into entities (or) if using Hibernate, inject the dependency using the onLoad method in Interceptor. More ways of injecting into Entities is detailed here.

4) Can an entity use domain services ?

Domain services contain methods that do not logically belong in one entity. The only difference between these services and application services is that domain services have access to all domain objects and all operations on them. In contrast application services can see only operations exposed by interface they use.

As with DAOs there is no harm in an entity using domain services. When looking at aggregates we will look at are some interesting scenarios where domain services are used.

5) Few entities need to access properties on another entity, during some of its operations. How to control this.

See 1, 2 above. No direct property access is to be allowed. Expose business methods that can mutate state, which, other entities can call.

There are 2 variants of which entities need access to which method

Entities belonging to same aggregate - Typically these are all part of same package, so expose methods as package private methods.
Entities from different aggregates - Expose the methods as public but ensure these methods are not present in Interface exposed to application layer

Tip: Expose methods using the strictest access control. You will never go wrong with this.

6) What kind of Entity methods should be exposed to Application layer?

Application layer typically should not act as a low-level orchestrator. It should not be responsible for calling a sequence of fine grained methods in correct sequence. It should only call a few coarse grained methods. These coarse grained methods should hide the complex work flow from the application layer.

Bottom line: Expose only coarse grained methods.

A typical rule of thumb that we used successfully to define coarse grained is - "A method is coarse grained, if it returns leaving the domain objects in a consistent and persistable state". Meaning all entities involved should be left in a valid state i.e. all mandatory params are set, all relationships are valid. The entities involved can be persisted as-is without any changes.

For e.g. in our example above,Container.removeBox is not a coarse grained method since it leaves the removed box without any valid owner. However methods like Container.moveBoxTo(Container newOwner) is valid since all entities are left with valid relationships.

Technorati Tags: DDD, Domain Driven Design

A series on Domain Driven Design

2008-05-09T20:19:00.019+05:30

In a couple of previous posts I had talked about anemic domain model that were caused by reverse engineering hibernate entities.

The powers that be, took a good interest in all of this and I was able to wrangle this into our schedule. My mandate was simple enough - find a better way to design.

I did a quick round of the top OO guidelines,

It really helped my company had a oreilly-safari account since I could skim through all of these books online without having to hunt around for a physical copy.

Of the three alternatives, Streamlined Object Modelling focussed on trying to reduce the design as applying a set of predefined patterns and templates. I felt there was nothing really new and revolutionary in Responsibility driven design. I liked Domain Driven Design since it focussed the whole design process around the domain, something which made a lot of sense.

In the book Domain Driven Design, Eric Evans introduces the basic concepts and moves on to explain the various building blocks and the refactorings that have to be undertaken.

In short term it's easier to stick to a very restrictive set of rules but as the pressure start mounting and you have team churn, people tend to relax their interpretation of the guidelines. So based on my experiences working on multi year projects, I have my own take on what works/not in long term. So I built on top of Eric's guidelines to create my own flavour/take on applying DDD in a contemporary project.

I'd be writing a series of posts on how to apply DDD with detailed instructions for each individual building block. I'll also add my experiences on what worked and what did'nt and some best practices that I came across. The building blocks, I intend to cover would be

Why DDD / Ubiquitous Language
Layers in DDD
Entities
Value Objects Part-1, Part-2
Domain Services
Aggregates
Associations
Repositories
Policies and Specifications
Modules
Final Analysis and Conclusion

If you would like me to touch on anything in detail, let me know.

Technorati Tags: DDD, Domain Driven Design

Spring-DM and OSGi Service Proxy

2008-05-03T12:04:00.005+05:30

I've been evaluating spring-dm over equinox for some time now. If your project already has invested in Spring then Spring-DM provides support for injecting dependencies across bundles and manages tx across bundle boundaries.

However the main feature of Spring-DM that I find actually useful is Springs support for OSGi services. Spring adds support for declarative publishing of services and consuming them. Any spring created bean can be published to the service registry using the xml tag osgi:service. Any service from the service registry can be references using the xml tag osgi:reference.

When you refer a service using osgi:reference, spring-dm creates a proxy over the service and injects the proxy into your code. This proxy protects the users of the service from the lifecycle of the service. So if the underlying service goes down, your code need not be aware that it should look up and bind to a new service.

The proxy is either a jdk dynamic proxy or a cglib proxy depending on whether the service has an interface or not. This proxy has a number of interceptors like transaction etc. Imagine the underlying service goes down, the next call to the proxy would trigger the proxy to check if service is up. If service is down, the proxy will block the call for a 'timeout' period at end of which a service unavailable exception is thrown. When a new service that matches the service for which the proxy is created, is registered, this proxy binds to that service and sends all queued requests to that proxy. Voila, your code is blissfully unaware a service went down and came up again.

However things are not all rosy. A couple of issues i faced

1) If the bundle that has the proxies is refreshed, the currently executing method throws an exception with message "service proxy destroyed". There is no way around this. Any new request that comes through would work fine.
2) We cannot add a custom interceptor to spring created proxy since it says configuration is frozen. So I had to create a proxy on top of springs proxy to add my custom interceptor. This is a fairly common case and should have been possible.

PS: The custom interceptor was created to transparently handle the service proxy destroyed issue.

Tags: OSGi, Spring-DM

Troubles with OSGi - Part 1 - Proxy Creation Issues with Hibernate

2008-01-24T09:14:00.001+05:30

We are trying out using osgi as a container for our application hoping to leverage it's dynamic updates and staged rollout features. Our system needs to be updated/rolledback with 0 downtime. Yeah we're working on *that* critical an application ;).

I have run into some pretty interesting issues using OSGi. There is not a lot of info on typical business applications that use OSGi. Some of the issues we faced were not documented and we had to get a lot of help from many forums. So the next couple of posts would be some stories around how OSGi was used and what pitfalls were faced.

All our code was packaged into small(er) osgi bundles. Following bundles were created -

Bundle 1 - All entity classes,
Bundle 2 - All business logic classes,
Bundle 3 - All DAO's ,
Bundle 4 - All client classes.

In initial stages I like to know how and why things work so that we can trouble shoot issues easily later. This made me decide against using the dynamic-imports feature. So every dependency needed by a bundle had to be manually declared in the manifest file.

Most of the initial issues got resolved quickly. But we started getting NoClassDefFoundError for HibernateProxy in our main bundle. This was really weird because the bundle that was executing the code, had imported all of spring's and hibernate's packages.

A couple of hours were spent trying to recreate the bundles, re-declare all the imports etc, but still no progress. I decided debugging was the best way to go forward. I got all spring/hibernate sources and started stepping through. Here's what I found.

Hibernate creates a proxy for any entity which does lazy loading. This proxy is a cglib based proxy that implements the HibernateProxy interface and also derives from the actualy entity object. When CGLIB is called to create a proxy, it creates a new class definition and creates the byte array representing the class. This class is then loaded onto the entity object's classloader.

However in OSGi each bundle has it's own classloader. So the entity bundle that contained only the POJO's had a classloader of its own and this bundle did not import any packages from any other bundle. The business code bundle imported the entity, hibernate and spring bundles and so the bussiness code bundle's classloader was wired with the other 3 classloaders. When CGLIB created the proxy and tried to define the proxy class in Entity bundle which did not import hibernate packages, it was throwing the NoClassDefFoundError.

The fix was to import this package in the entity bundle and things were all set. But this whole issue raised 2 main concerns

1) The stack traces raised by Equinox OSGi framework does not give detailed info on source of an error and just gives info on the initial bundle that was executing the code when error occured. Is this an issue with Equinox alone or is it same across all other osgi containers ?
2) Even though a bundle may not directly use a class, you may still have to import it (or) use dynamic-imports and be masked from this. Either way it's still ugly.

Tags: OSGi, Spring-DM

Hibernate, Encapsulation and OO

2008-01-14T09:24:00.001+05:30

The current project I am working on, is a typical legacy application rewrite. The existing system in cobol has been around for ages and the code base had apparently got very bloated and unstructured with so many patches/fixes/what-not down the years.

One of the main issues in the old system was that the code had become un-manageable and making a small change involved poring over thousands of lines of code to find out where something was getting changed.

In the proposed java based system, in the tradition of all enterprise applications, we would be using hibernate and spring. When we reverse engineer Hibernate/JPA entities from DB schemas the java objects created have public getter/setters. This seemingly innocuous feature is the one I have a biggest gripe with. This totally violates the whole data abstraction notion in OO systems.

When the entities are exposed as such, it becomes easy for layers above it to change entity state. This is very convenient when writing code and lets one design classes that updates multiple entities. But this also leads into the same problem which caused the rewrite in the first place. By letting anyone update entities, we allow business logic to be just dumped into a class and be called. No structure is need.

I prefer having one gateway class where all business logic pertaining to one entity is located. This simplifies making any changes to the system and any impact analysis need not span the entire code base. But in a model where entities have public setters, this can never be enforced.

Whats to prevent the current rewrite from being as messed up as the system its replacing? Processes like code review helps to a certain extent. But when push comes to shove, cutting corners becomes the norm and out goes all the best practices. This leads us back into creating our own tangled web of code to replace the older tangled web. It may seem too far-fetched but after 8 years of looking/writing all sorts of patches/fixes and features, only one thing is certain. If it can be abused, it eventually will end up being. Maybe even by the original developers ;)

ObjectMentor has a blog posting about this that states that jpa/hibernate entities should be treated as datastructures and not objects. They suggest adding a new layer of objects that map to the hibernate/jpa data structures and the rest of the application code uses the created objects.

This is a good idea in that we can create proper OO code that are not limited by the active record style entity objects. This can potentially even help in resolving some of the issues i had with anemic data model.

Tags: ddd, domain driven design

Dual booting Vista and Ubuntu

2007-12-10T06:01:00.000+05:30

The last couple of weeks have been interesting. I got a new laptop, a dell inspiton 1720 to be precise. I got it pre-installed with Vista home premium edition. But I wanted to have a linux distro to play around with. Ubuntu was the obvious choice for its ease.

Before i got around doing that I had to overclocked my video card in vista to get it to play Oblivion smoothly. The laptop has a 8400m GS but when i started played Oblivion i got a measly 30-35fps. I had a 156.xx driver but somehow 169.04 would never install cribbing that that it could detect any suitable driver to update. However the 169.09 from laptopvideo2go installed properly after uninstalling all available videocard drivers. After I installed rivatuner and overclocked from 400Mhz to 500Mhz the fps on Oblivion increased to a nice 45-50 fps. Plus my 3DMark06 scores increased from 1276 to around 1450+.

Then it was bigger things on hand, so I downloaded ubuntu, waited for a nice saturday morning to install and configure ubuntu. List of steps followed

1) Created a new partition of 30Gb using windows partition manager.
2) Burnt the live cd iso image onto a cd
3) Booted up the laptop using the live cd and press install. Some site say inspiron series should use the alternate cd but live cd just worked fine for me. (At this point i got stuck since it would not return from trying to find the partions. So had to got Places>Computer>OS and then rety the install to get it going past this option.
4) Reboot once install was done and i find my wireless wont work.

Logged into windows and found some nice instructions on setting them up. Got that sorted out and I was all set finally !!

To finish things I installed all the required software using the synaptic package. As a nice touch i was even able to build ruby1.9 from the source.

Some must visit links when you are planning to dual boot ubuntu with vista on a dell inspiron 1720

1) General instructions - http://apcmag.com/5046/how_to_dual_boot_vista_with_linux_vista_installed_first
2) Setting up the broadcom wireless card - http://ubuntuforums.org/showthread.php?t=297092 (I should add other instructions were not this clear and did not work properly)
3) To get soundcard installed you have to install the package - 'linux-backports-modules-generic'
4) To install ruby 1.9 and keep 1.8 follow instructions at http://ruby.tie-rack.org/28/installing-19/

Unit Testing Guidelines

2007-10-24T01:44:00.000+05:30

Ravi had blogged about the difficulties of getting people to write unit tests. I could relate to him since I have come across this problem quite often.

IMHO a developer gets turned off from writing test cases because

Most developers dont have a clue on how to write proper unit tests. Most end up thinking integration test cases instead of unit testing.
Proper unit testing (not integration testing) is hard work and needs proper design
Estimating for unit test cases are not done or is under-estimated since we tend to have close to 2X lines of test code for a code of size X lines

Many times I have had to make developers understand what a unit test is and how to approach it. And this is the general guidelines i normally give them.

A unit test in my definition should

Should test only one class - Even if a method in the class under test, calls methods on other dependent classes, this test should be responsible only for verifying that this method works fine provided the dependent classes return correct values.
Continuing from 1, the dependent classes should have its own tests to verify all possible code flows. Doing this from a higher layer increases the # of test cases you have to write.
The unit tests for the higher layers (above DAO layer) should use Mocks (and ofcouse Dependency Injection). Use either jMock or EasyMock to mock out calls to other layers. If you are unit testing without Mocks it means you are doing integration testing between two classes since you verify functionality of both.
Test boundary conditions like what happens if you pass in a null object, what happens if your dependent class throws an exception etc.
Test that the class throws all exceptions declared in @throws (and any runtime exceptions) exactly under the conditions documented
Test DAO's even if you are using ORM tools, by using an in-memory DB like Derby or HSQL

In addition to above a code coverage tool like EMMA or Clover is a must have tool to capture coverage and draw attention to lesser tested parts of the application. Configure this to generate a daily/weekly report or better still hook it upto your cruise control. In most cases the developer themselves take it up as a challenge to get the code coverage up.

Anemic Entities - Fallouts of an EJB era ?

2007-10-24T00:15:00.000+05:30

When I first started working with EJB's the 1.0 and 1.1 versions, there were two types of enterprise beans

Session Beans
Entity Beans

We were all taught to put business logic into session beans and persist them using entity beans. No business logic was present in entity beans and it generally had only getters/setters. The only reason we were encouraged to put business logic in entities was to get performance gain - EJB tips.

According to OO principles the definition of a class states that a class should contain both structure and behaviour. And we ended up violating this first principle of OO by splitting our structure(entity/vo) and behaviour(model/services) into 2 separate layers (because of our tools ??). This anti-pattern has been termed as Anemic Domain Model by Martin Fowler.

This influence sort of carried on with most of the people. Even after EJB's lost the appeal and with IOC/ORM tools gaining popularity, people still architected systems where entities/value-objects/dto were a layer of objects having just get/set methods. These objects were read from DB using DAO's and sent to model/services layer where all business processing happened.

To be fair to people, the IOC containers of the day did not support DI'ing objects read from DB using tools like hibernate. With such excuses, we lived on writing more procedural style code with OO languages.

Now Spring 2.x has started supporting dependency injection on objects whose life cycle is outside its control. Using the @Configurable annotation Hibernate can create entity/dto objects from database and spring configures these objects a normal bean and wires up the dependencies.

Some more info regarding this can be found here and here.

To me creating an architecture where i can tell the domain object to go take of certain things leads to a very powerful api and also the system is easy to understand.

For e.g. I would like to do things like the following in my api's.

order.ship() instead of shippingService.ship(order)
movieRental.calculateLateFees() instead of feeService.getLateFees(Rental)

Coupled with a FluentInterface, I think this should be the future of enterprise apps (well atleast till erlang/haskell become more mainstream). This would make systems more easy to maintain and cleaner.

I did not make the relationship between the anemic-domain-like-design to EJB's till i proposed to a co-worker on adding more domain logic into the entities, the first response was

"This looks good, but should'nt we have all business logic in separate classes like how we did it using session beans"

And then it stuck me, things are not about to change for a long while !

Don't be Greedy be Dynamic

2007-05-09T08:06:00.000+05:30

If you are given unlimited number of coins of values V1, V2,… Vn etc and asked to find the minimum number of coins needed to create a Sum S then what would be the solution you would come up with ?

To better illustrate take the typical example, if you are given unlimited supplies of coins value 1, 2 and 5 and asked to create values of 8. Then one solution can be 8 = 8 coins of value 1 or 8 = 4 coins of value 2 etc but the solution that uses minimum number of coins overall would be 8 = 1 coin of 5 + 1 coin of 2 + 1 coin of 1.

Being Greedy

When I looked at it for the first time I thought the easiest way to solve this would be to act greedy.

Sort the coins in descending order with maximum valued coin being first. If number of coins is N then

For c = 1 to N

Take the value of coin at index 'c' and see how many times it would fit in the Sum required.
Find out the modulo of the Sum with value of coin at index 'c'
Repeat the calculations 1 and 2 for the next most valued coin on the modulo value got in step 2.

Sum of values obtained in 1 would be the number of coins required.

Applying this to get a value of 8 the steps would be

Loop1 = 5 will fit in 8 only 1 time, 8 mod 5 = 3

Loop2 = 2 will fit in 3 only 1 time, 3 mod 2 = 1

Loop3 = 1 will fit in 1 only 1 time, 1 mod 2 = 0

Number of coins needed = 3 !

Code in Java

private int[] coinArray = { 1, 2, 5};

private int minCoinsNeededToGetCount(int neededCount) {

int coinCountNeeded = 0;

int tempNeededCount = neededCount;

for(int k = coinArray.length-1; k >=0; k--) {

if(tempNeededCount >= coinArray[k]) {

int numCoinsOfThisTypeNeeded = (tempNeededCount - (tempNeededCount % coinArray[k])) / coinArray[k];

tempNeededCount = tempNeededCount - (numCoinsOfThisTypeNeeded * coinArray[k]);

coinCountNeeded = coinCountNeeded + numCoinsOfThisTypeNeeded;

}

return coinCountNeeded;

}

But is this the best and correct solution ?

Being Dynamic

Described as one of the two sledgehammers of the algorithms craft, Dynamic Programming is very powerful and can be used to solve a wide variety of problems.

The two major things to remember in Dynamic Programming is that we break the problem into a collection of sub problems to solve such that a solution to one sub problem depends on the solution of another smaller sub problem.

In plain recursion we solve the same sub problems again and again. One of the main differences that Dynamic Programming brings over plain recursion is that here we store the results of the sub problems and do not compute them again. This is called 'memoization'.

So Applying this how would our solution be ?

Coins for Sum 0 = 0
Coins for Sum 1 = 1 coin of Value 1+ No of coins for remaining Sum of 0= 1

-> Remaining sum 0 is got by Sum needed 1 minus coin value considered 1 = 0

Coins for Sum 2 = Min ( 1 coin of Value 1 + No of coins for Rem.Sum 1, 1 coin of value 2 + No of coins for Rem.Sum 0 ) = Min (2, 1) = 1 ;

-> Remaining sum 1 is got by Sum needed 2 minus coin value considered 1 = 1

-> Remaining sum 0 is got by Sum needed 2 minus coin value considered 2 = 0

Coins for Sum 3 = Min ( 1 coin of Value 1 + No of coins for Sum 2 , 1 coin of value 2 + No of coins for Sum 1 ) = Min (2, 2) = 2

So we take the sum required and find out the difference between that sum and various coin values and get small problems. The solution to those small sub problems are already available and we just use them to build bigger solutions.

private int[] coinArray = { 1, 2, 5};

private void findMinCoinsNeededForSum(int sum){

int coinCounts[] = new int[sum+1];

Arrays.fill(coinCounts, 999);

coinCounts[0] = 0;

for(int i = 1; i <= sum; i++) {

for(int j = 0; j <>

int stateToCheck = i - coinArray[j];

if(stateToCheck >= 0 && coinCounts[stateToCheck] + 1 <>

coinCounts[i] = coinCounts[stateToCheck] + 1;

}

int i = 0;

for(int value : coinCounts) {

System.out.println("for " + i++ + " coins needed " + value);

}

Somehow when I wrote these 2 I felt the greedy approach was more simpler to understand and that was the first thing that came to my mind. But is it the right thing?

Given coin values of 1, 4 and 5 and asked to compute a sum of 8 greedy returns a miserable minimum coin count needed of 4 - one 5 and three 1's. So there u have the clear winner !!

Memory Issues using Java ThreadPoolExecutors

2006-12-30T09:45:00.001+05:30

Java ThreadPoolExecutors are a very conventient way to make your applications as concurrent. We recently began a drive to refactor most of our code so that we make use of these ThreadPools. We had a particular process where we read XML files from filesystem and did some transformations on the XML and write the transformed XML into the DB. When we started testing the initial code, we ran into serious OutOfMemoryErrors for even a few hundred XML files.

This was a serious drawback. I looked at our code and found we had set our pool size as 15 and had a blocking input of 500 which meant only 515 xml files are meant to be in memory at any given point in time. This was puzzling since this ideally should not max out memory in a 1.5 GB heap.

Roughly our process was like

XML File --> Callable --> Thread Pool (insert into Db) --> Return boolean success

The pool took a Callable that held a reference to xml and wrote it into DB.

On further analysing the code the only thing that was suspicious was an innocuous looking ArrayList. This List held all the future objects so that we could iterate thru this list and wait for all the inputs to be processed before terminating the process. Why would a List of Future objects cause issues?

To identify the root cause I looked into the JDK ThreadPoolExecutor implementation and found the following

1) When we create a Callable task and submit it to the ThreadPoolExecutor using any of the submit methods, a FutureTask object containing the callable is created. This is returned as return value of the submit method.

2) This FutureTask Object is a concrete class implementing both Future and Runnable. This object is the one that is submitted into the ThreadPoolExecutor. The ThreadPoolExecutor never takes a Callable task directly.

3) The ThreadPoolExecutor when its ready to execute a new task picks up the task (a FutureObject or a Runnable) and calls the run method on it.

4) The FutureTask object has stored the callable object as its instance variable. The run method calls the call method and the result returned is set into an instance variable on the FutureTask. At this point both the Callable object and the returned value from the Callable object are both instance variables in the Future object.

5) The Callable and its results are never set to null in the Future.

So since all the Callable objects were in memory, and each callable maintained a reference to DOM object we maxed out on memory ! So we came up with a set of rules when using Callable to make life simpler.

Rules when using Callables and Futures :

A) Never maintain a huge state in Callable. The state variables will not be explicitly GC'ed as long as a reference to the Future is held.
B) If you need to have a lot of state in Callable, ensure that you clean them up at the end of the call method.
C) Never hang on to the Future indiscriminately. This will prevent the Callable and its Return value from being GC'ed.

I dont understand why FutureTask needs to hold on to Callable forever. Why can't the executing thread on completion set a variable called result in Future and nullify the reference to callable ? I dont have an answer yet but this sounds logical to me. Can someone please educate me?

Improving performance by changing system bottlenecks

2006-07-24T13:07:00.001+05:30

We were in the process of trying to tune some code that has been around for about 2-3 years. The system is pretty straight forward. We get a bunch of files for each country. We read and process file in 3 groups - pre, main and post process where the processing done in each group is dependent on some processing in the previous group. In the post process, we read data that was written into the DB in the pre and main process and the do some process on it and then write it back to the DB. The reason that we read from DB is that the amount of data that is processed in pre and main is huge and cannot be kept in memory till the post process is triggered.

We already we using threads to run the sub processes in each process(pre, main & post) in parallel unless there were any dependencies. We were using connection pools and object pools for heavy objects. We were at a loss to figure out what more can be squeezed out. We started questioning our flow..

This is the way our simplified conversation would look like.., but these questions were raised over a period of 2-3 days and not on same day.

Q: Which process takes most time
A: Post process takes as much time as pre and main combined.

Q: How much time does the value add process time in post process take
A: Reading/Writing of Data takes 98% of time and processing the data takes 2% of the time

Q: Why did we have to goto the DB in the post process to read data that we wrote into DB in the same JVM in pre and main process?
A: Coz the amount of data if held in memory would increase heap usage by close to 900MB

Q: What do we need to not read data from the DB
A: A good cache manager that persists data if heap usage increases and whose read time is better than a DB read

Q: Will things like JCache etc work?
A: They will but the keys are not single objects but are queries that have 2-3 where clause entries.

Q: Why not write our own cache implementation
A: Get a life !

Q: What is the distribution of reads/writes
A: For every 7 reads we do one write

Q: Why are reads so slow?
A: Coz its a network call u bozo

Q: Will having the DB in the same box as java process help?
A: Might, but mostly might not since we still have to go thru all 7 network layer plus the unix box is connected to DB box via a 100mbs dedicated link

Q: How do u get to remove the 7 newtwork layers involved
A: Only if u put the DB process into the Java process

.. and then it dawned on us to think if using an in-memory database would remove this bottleneck. We then decided to cache all data using an in-memory database and read data from that in the post process to speed up whole process.

Then we again thought...now since there is no need for us to hit the DB, what more can be pruned off ?

Q: Dude why do we need the post process, can u tell me once again?
A: To add data from main and pre process into DB

Q: Why cant we do it in main itself ?
A: Hmm..historically it was never so.... but i think it makes sense too...but we need some data from pre process to be mixed with some data from main block so thats the reason i suppose

Q: Cant we have the pre block data mix with Main block data in main/pre blocks itself?
A: On yeah, only if u want to read the same file in both blocks

Q: How long does it take to read the file to get the data in pre block?
A: Hmm...not more than 20-30 seconds max..so i suppose it should be ok to read the file multiple times without any performance issues

Q: Do u still neeed the post process block?
A: Hmm...Most of the data mixing functionality can done in Main/Pre blocks by parsing some files in both pre and main blocks. This way we dont have to re-query the DB for data in post block and that'll save us running around 10K queries. But still some processes need to be present in post block but they are light weight processes

Q: Hmm...we still block moving from each process block like pre to main etc waiting for the oracle queries to complete execution.
A: Hmm..thats interesting...since we are having an in-memory db the inserts to that DB are nearly 3-4X times faster than Oracle inserts. So why not block on the in-memory queries to complete let the oracle inserts go on in the background. We can use a jdk5 concurrent pool to run the oracle queries in the back ground and let the JVM terminate when all the futures(java.util.concurrent.Future) are done.

A: You better stop now...my head's spinning....argghhh !

FYI we used Derby DB from apache as our in-memory db to speed up the process. We used two thread pools one which ran the insert statements into the In-Memory Derby DB and another that inserted into the Oracle DB. We ended up parsing a 10MB xml file twice but parsing using SAX only took around 20 seconds of our processing time so it was no biggie. Also contrary to popular belief running two queries did not degrade performance since we were not blocking on the DB that took a lot of time to execute. So we re-arranged our entire set of bottlenecks such that we only waited for the oracle inserts to complete when we were ready to end the process and shut down the JVM.

So what was my learning from the entire experience ...

Any performance improvement process should consist of following steps

1) Use a good profiler to profile both memory and time spent in each module
2) Identify the bottleneck processes
3) Run non-dependent processes in parallel
4) Question the flow of the process and see how a dependent process can be made into a non dependent process. In case it cannot break the dependency into small pieces so that a process is waiting for the smaller dependent task to run instead of the bigger task.
5) Do 3 and 4 again once the code is stabilized and ur still not satisfied.

Why Java pales in comparison to Ruby

2006-05-11T15:58:00.000+05:30

Ruby stands out in comparison with Java in a lot of ways. Its dynamic nature provides for a lot of interesting ways to look at programming. For someone coming in from the java world, the nice hacks that you can never do using java include

Ability to Intercept messages (aka method calls) to Objects using callbacks
Define a class instance for each object. We can change class defnitions associated with one instance of an object ! You can just imagine what u can do with that.
Dynamic Typing - any object that supports a certain message can be used. So no messing around with casts. Code becomes more cleaner and shorter.
Dynamically extend funtionality of any class meaning no class is sealed/final and we can define methods in Object class itself and all objects across can see the new method immediately.
Dynamically create/remove methods/classes on the fly.
Support some forms of functional programming

Anyways this list is not complete or comprehensive. It just contains a list that springs to my mind immediately.

Java purists(lovers ?) can argue that barring the language constructs things like VM optimization, non-green threads, extensive libraries that just about do anything would take a lot of time to manifest in Ruby. Both have their merits and de-merits. But for something that was built by someone in their spare time and with no big corp support (likes of Sun, IBM and Oracle) Ruby sure has come a looooong way.

Future of Java Synchronization - Escape Analysis, Lock Coarsening & FastPath

2006-05-08T11:26:00.000+05:30

The future releases of Java has a few important synchronization related changes. The new features are

1. Fast Path

Ok..to understand what FastPath means, let us look at how synchronization works in Java. Each Object in java should support synchronization but only a tiny fraction of the objects we use would ever be used for synchronizing. So the overhead (of memory) to implement sync'ing should be very minimal. Hence all the information needed during synchronization are stored in a separate class and objects of this class would be referenced when the object is being sync'ed on.

The class that holds sync info has various fields some of which are a counter holding the number of threads blocked/waiting on this object, an OS specific semaphore object (very heavyweight object) and a counter to track nested syncing on same object by same thread.

When a thread attemps to sync on an object for the first time, the JVM sees that the object has either no sync info instance associated or that the sync info instance has counter of threads waiting as 0. Both of these mean there is no contention on this object yet. So the sync call proceeds into a fast path execution. It either creates a sync info instance and updates the address of this instance into the object(the one we're sync'ing on) using a CAS (compare and swap) instruction or updates the owner field in sync info instance to refer to the new thread.

If the sync info instance has a non-zero instance (yeah we're screwed !) the JVM blocks the thread on the OS specific semaphore. This operation is heavyweight and is called slow path. I dont know why this is heavy and i can just speculate on why its so. So let me not get into that without having more info.

There are two ways of implementing this blocking...using either a infinite loop checking if the object is freed (spin locking) or using the bad OS semaphore to do the work. The first one is CPU intensive and can only be used if the locks are held for very short durations and second can be used anytime. Apparently the JVM will either do an infinte loop for short sync operations. Also the Compare and Swap instruction itself is being replaced by something more efficient. Dunno what though.

All these things help making the fast path locking go thru even faster and attempt to implement spinning to ensure that semaphores are used as minimally as possible ensuring the sync operation itself becomes fast.

2. Escape Analysis to help Lock elision

If an object being locked by a thread can never be accessed by another thread it means each of the synchronizations will always occur on a new object. In such cases we can eliminate the synchronization itself leading to the code being faster since the memory will not have be flushed, the lock will not have to be checked and created. This helps in faster execution times.

3. Lock Coarsening

Consider the following code

private String getMessag()
{
StringBuffer sb = new StringBuffer();
sb.append("line one");
sb.append("line two");
sb.append("line three");
}

Each of the StringBuffer.append instructions would involve locking on the string buffer itself. But we can see three calls to this method meaning we would do the following operations thrice

1. Flush of memory to main memory
2. Acquire Lock on String Buffer
3. Flush of memory to main memory

Since the above function can be re-interpreted as

private String getMessag()
{
StringBuffer sb = new StringBuffer();
synchronized(sb)
{
sb.append("line one");
sb.append("line two");
sb.append("line three");
}
}

we can cut down the number of sync calls to only one. This is called lock coarsening since we effectively coarsen the lock.

Some of these 3 features are in Mustang (Java 6) while most are in Dolphin (Java 7). The mustang already does some escape analysis but apparently the info is not used to do lock elisions and only some simple coarsenings are supported currently. So expect to wait longer till ur code can run fast. To know more on this u can refer to Davids blog entry.

First Experience with Ruby

2006-04-20T07:35:00.000+05:30

It seems Ruby is becoming omnipresent these days. A lot of hype, hope and buzz seems to surround it. A few of my friends moved into ruby and went ooh-aah over the new fanged toy that does wonders. But when i tried to get them tell me what was so intersting and powerful, the common answer i got was that it was so simple to write code. Sufficiently piqued i decided to get some first hand experience.

So i borrowed a PragmaticProgrammer book on Ruby (online version can be found here) and started reading thru it. It is a good book with the only grouse being some of the topics were not covered in depth as i would have preferred.

Anyways after poring through half the book the list of features that spring out are

1. Fully OO

Everything seems to be a method connected to an object including new, loops etc. For e.g. to create a new instance of my object you do MyObject.new(). Numbers (java equivalent of int's) are all instances of FixedNum class so you can do things like 3.times, 3.step(30, 3). Both the 3.X methods are looping structures with the first one looping 3 times and the second behaving like a java for loop.

2. Iterators and Blocks

Ruby defines a bunch of iterators that operate over Ruby containers (Arrays, lists and hashes). Coming from the java world, to me an iterator just meant a way to loop through the contents of a collection. But Ruby iterator is a different beast that no only loops through the contents but also executes a 'Block' of code for each object in the collection. To make it seem less daunting, lets look at an example..

[ 1,2,3,4,5 ].each {|i| print i } This would print 1 2 3 4 5 on the console.

In Java, we'd have to do..

int numArr[] = {1,2,3,4,5}
for ( int i =0; i < style="font-weight: bold;">3. Closures

A closure is a block of code that retains the values of the instance/local variables it accesses or uses. So we can create a closure, which accesses a few local variables defined and set above it. This closure when passed to a method in a different class will retain the local variable values as it was in the original context.

Martin Fowler has a very good article on closures. Read it to know abt the power of closures.

4. Assignments and Operator Overloading

Assignments can be chained since each assignment returns that value as the result of the assignment expression. So doing a = (b = 1 + 2) + 3 will set a to 6 and b to 3. Also both if and case return the value of the last expression executed. so we can do i = if (num <>

Also operator overloading is supported. Damn java seems to be the only one not supporting it these days.

5. Getter/Setter

We can have getters/setters for object variables without writing methods. In ruby classes attr_reader and attr_writer followed by variable names will make the attributes readable/writeable.

6. Mixins

Ruby like Java supports only single inheritance. Mixin is a powerful concept where you define multiple modules and when u include them in your class all the methods in the mixins are accessible in your class. This mixin does not copy the code into the local class just references it, so any change to the module methods/variables in any class will affect globally.

For e.g. ruby comes with a standard module Comparable which defines comparision operators. For comparable to work we have to have implemented the method <=> in our class which the module uses. This is similar to an abstract base class in java but only we are allowed to have multiple abstract base classes for a single class. Pretty powerful !!

So Ruby has some features i like (all the ones above) but its arrays and collections are not strongly typed and i have a general aversion to such things having some bad memories from a VB/ASP project long ago.

Overall i have to think about some real life problems that ruby makes it easy to solve and actually scales well before actually going nuts over it.

10 Tips for good API Design

2006-04-19T07:53:00.000+05:30

All along my life, i had complete control over my code. I could change method/class names, signatures at will and all users would be in the same code base. I could do endless refactoring. Now I have to maintain and keep enhancing an API. Sounds easy ? Its actually big trouble, every method you publish would end up being called by thousands of people and you lose your ability to refactor in a jiffy.

Public API's are forever and there is only one chance to get them right - Joshua Bloch

I learnt good API design is an art in itself and i also picked up some basic rules that should be followed. They are

1.Intuitive APIs

When designing an API, always consider specific examples of what code clients should write to use it. Model api's after commonly used usage patterns/api's. This will help us avoid cumbersome or unintuitive APIs.

2.Internal Code

All internal code that should not be called by public should go in a separate package marked internal. Like all public classes in com.company.product.* and internal classes in com.company.product.internal.*.

3.Expose only what's needed

Use the most stringent access specifiers possible (private/package provate/protected/public). The idea is to prevent unintended usage of the api's. Some common tips
1) All classes/functions that should be called by other classes in the same package should be declared as package private. Note there is no such thing as package private interfaces since all methods in a package private interface have to be declared public.
2) If a method needs to be called by classes in a different package then the tendency is to make it public even though it is not a true public api. In such cases, a good way to proceed is to make the class as abstract, then subclass this class in the private packages and use static factory methods to return an instance of the internal class. This way the methods would be exposed in private packages only.
3) Declaring a method as protected is as good as declaring it as public since any one can subclass and be able to call the method. It should be used only when it is sure that clients should be able to override them or subclasses are in different packages.
4) Make classes/methods final such that they cannot be overridden. Turning it around, assuming clients will extend any of the public classes which are not final it is unpredicatable to let clients overide selected methods without understanding the big picture.
5) No member variables should be public except constants (public static final)

4.Javadoc

All public methods/constants in a public class must be documented. The ease of use of your api depends on how good the javadoc is. A good clean javadoc means there is no implementation detail specified. All operations on data that are visibile to clients is well documented as are error conditions. Specify clearly what each method will and more importantly will not do.

5.Static Factory creation

Prefer using Static factory methods to create objects than using new. This allows us to create any object that implements the specified contract than exposing a truckload of classes to the public. Use of a constructor forces the implementation to return an instance of the class itself rather than a subclass (or one of a set of subclasses). This would force the class to have knowledge of all variant behaviours.

6.Contexts

Dont store contexts in objects. Prefer to pass them as method parameters. If we store contexts in objects and pool those objects, then we have to ensure that the context is valid throughout object lifecycle. This is a big pain.

7.Thread safe classes

Immutable classes and fly weights are easily thread safe. Prefer to use them judiciously.

8.Helpers & Utils

Any util class that needs to be associated to a state should be made a helper that has to be instantiated with the state. Utils should be final and non instantiable and all methods should be static.

9.Class Names and Method names

Make names consistent throughout the api, length() method returns length of both String/StringBuffer. If you get the naming correct most people would be able to use it intuitively without poring over the docs. This makes the api's very simple to use.

10. Exceptions

Never throw a single type of exception with different messages. Use lots of different exception classes with the rule being roughly for each kind of error throw an exception. Be sure to put them in proper hierarchy like java.io.IOException so that clients have the flexibility to declare a single block for a general category of exceptions.
Always ensure that the exception message is picked up from a resource bundle or some other file so that messages can be easily translated. Hey, you never know who is going to use your api's !

When in doubt leave it out. We can always go in and add something later after more delibration.

Why a ConcurrentHashMap is so fast

2006-02-27T08:27:00.000+05:30

We all know a ConcurrentHashMap is nearly as fast as a HashMap plus provided concurrency like a Hashtable. Some very ingenious coding has made this possible. To understand the hows and whys the key is the structure of the ConcurrentHashMap itself.

A ConcurrentHashMap contains a final array of Segment objects. Each segment extends ReentrantLock, and contains a transient volatile array of HashEntry objects. Each HashEntry object is made of final key, hash and next variables and a volatile content variable.

Every get/put/remove/add operation involves creating an index from the hashkey that maps to one of the available segment objects. The call is then delegated to the obtained segment instance.

Let take put call. The segment first locks itself (since extends from ReentrantLock). Then proceeds to check if the hashkey already exists, if it does, it updates the value of the volatile value field in the HashEntry object. Since it is a volatile variable, the new value is guaranteed to be seen by other threads in the jvm without any explicit need for synchronization. Voila ! Now if the key is not present then a new HashEntry object is created and added to the head of the existing list. Since there are an array of Segments, the writes can be spread and not all threads might lock on the same segment leading to more concurrent writes.

Consider a get call. Get the first HashEntry and iterator till the end and if found return the value. No locking at all. Same as a HashMap but unlike a Hashtable. So all reads work at nearly same speed as HashMap.

There are 2 things that make this possible

1) The new JMM guarantees that Volatile reads are not re-ordered with volatile writes and all reads after a write will get the updated contents without synchronization. Variable that holds value reference in HashEntry is volatile. Plus the whole HashEntry array in each segment is volatile. So any changes to value or every newly added HashEntry object(or key-value pair) is visible to all threads after they are assigned without any syncronization.

2) Final fields initialization safety, all threads will see the values for its final fields that were set in its constructor.Further, any variables that can be reached through a final field of a properly constructed object, such as fields of an object referenced by a final field, are also guaranteed to be visible to other threads as well. So if a new key-value pair is added and is instantly accessed by another thread the key, hashkey and next pointer will have proper values and never null.

These ensure that any add in any thread instantly reflects in other threads, without any flushing of memory. Does this mean no locking at all ? None in the java code but the jvm implementation will have to do some locking to ensure that volatile variable reads return latest written values. Since its a very lower level it should be more faster.

Never seen an API that uses the features of the JMM to this extent. Hats off to Doug Lea who made this all possible.

Time based UUID Generation Algorithm

2006-01-10T15:55:00.000+05:30

We had a requirement recently that we should map files to UUIDs. This gives us the flexibility to refer to a file without using a name thereby enabling us to rename it. So we dug a lil bit on UUIDs. java.util.UUID is the UUID implementation in java and this is the RFC its linked to.

So basically a UUID (java.util.UUID) represents a 128-bit value. These bits are split as

32 bits time_low
16 bits time_mid
16 bits time_hi_and_version
16 bits clock sequence
48 bits node

Timestamp is a 60 bit value of the UTC as a count of 100-nanosecond intervals since 00:00:00.00, 15 October 1582.

Clock Sequence is used to help avoid duplicates that could arise when the clock is set backwards in time or if the node ID changes.The clock sequence MUST be originally (i.e., once in the lifetime of a system) initialized to a random number to minimize the correlation across systems. If the previous value of the clock sequence is known, it can just be incremented; otherwise it should be set to a random or high-quality pseudo-random value.

Node For UUID version 1, the node field consists of an IEEE 802 MAC address, usually the host address.For UUID version 3 or 5, the node field is a 48-bit value constructed from a name. For UUID version 4, the node field is a randomly or pseudo-randomly generated 48-bit value.

There are four different basic types of UUIDs: time-based, DCE security, name-based, and randomly generated UUIDs. These types have a version value of 1, 2, 3 and 4, respectively. Lets look at the time based UUID generation algo.

Time based UUID creation Algorithm

These are the steps as present in the RFC. All italics are my comments..

1) Obtain a system-wide global lock - How ? Simple use a java.nio.channels.FileLock ! This is what the java.util.logging framework uses to ensure log entries are not overwritten when used from multiple JVMs.
2) From a system-wide shared stable store (e.g., a file), read the UUID generator state: the values of the timestamp, clock sequence, and node ID used to generate the last UUID.
3) Get the current time as a 60-bit count of 100-nanosecond intervals since 00:00:00.00, 15 October 1582.
4) Get the current node ID.
5) If the state was unavailable (e.g., non-existent or corrupted), or the saved node ID is different than the current node ID, generate a random clock sequence value - If someone deletes the state store file, then since we start off with a random number we can still get a unique UUID.
6) If the state was available, but the saved timestamp is later than the current timestamp, increment the clock sequence value. - Ingenious !! This means if u revert back ur clock the UUID will still remain unique.
7) Save the state (current timestamp, clock sequence, and node ID) back to the stable store.
8) Release the global lock.
9) Format a UUID from the current timestamp, clock sequence, and node ID values.

The algorithm looks very simple and elegant, and though there are other ways of getting UUIDs this is the one that is easy to understand.