I've fiddled with my blog template because I decided I wanted more horizontal viewing space, given that it was using less than a third of my 1920 horizontal pixels. If it feels too spread out for you, I added a drag-and-drop handle over to the left to let you resize the main content column. The javascript is pretty primitive. If it breaks, drop me a comment.
>
>
>
>

Wednesday, October 1, 2008

The First Chapter

Here's an interesting problem I came across the other day related to transaction demarcation.
I'm writing some message-driven code using a JMS broker--specifically, ActiveMQ: http://activemq.apache.org/. A lot of the processing is actually running in the broker JVM itself using Apache Camel to move messages around: http://activemq.apache.org/camel. We recently had trouble with some of the processing code when we first deployed the message broker to a staging environment and tried to push several thousand messages through it. It ran through a bunch of the messages just fine, then it stopped--no errors in the log, no crash of the broker, no CPU use... nothing. It just sat there grinning stupidly at me. Here's how the message processing works...
New message comes in. Get the important details out of it and invoke a processMessage method that does everything needed to put the message into our database. The database is Oracle in staging and production but Apache Derby in development. The processMessage method is wrapped with a typical transaction using Spring's declarative transaction management. Inside that method, another method is invoked that’s configured with PROPAGATION_REQUIRES_NEW. There are a few factors working together that make this necessary, but they’re not relevant here.
It goes something like this:
  1. Begin outer transaction/open Hibernate session 1
  2. Do some transformations and creation of object A to represent message
  3. Get related object B from database or an external source
  4. If object B came from outside, save it with session.save(B) (session 1)
  5. Associate object A to object B
  6. Begin inner transaction/open session 2
  7. Save object A with session.save(A) (session 2)
  8. Commit inner transaction/close session 2
  9. Attach object A to session 1 for continued processing
  10. Maybe make other changes to object A
  11. Commit outer transaction/close session 1
Here's how the problem manifested...
When the broker froze up, the logs stopped right after a SOAP call was made for object B and we tried to save it. Significantly, it was the first time since the broker was started that we’d had to make a SOAP call and save the result to the database. The last thing the logs record is that B was retrieved and saved successfully, but it never made it into our database. We ruled out the possibility that the problem was in the data being retrieved. It was definitely a problem with the code running in the broker.
That’s it. Can you figure out what’s wrong? Read on for the answer.
Hint 1: Object B has unique constraints on some of its fields in the database.
Hint 2: I’ve left out some details about how objects A and B are mapped. I didn’t have the information on hand when I was analyzing the problem and was forced to make some assumptions that allowed me to reach a solution. I still don’t know exactly how they’re mapped, because I already fixed the problem without looking at the mappings. Assume that the relationship from A to B is mapped at least as cascade=”save-update”, and therefore object B will be saved when object A is saved in step 7 above.
Hint 3: It’s all in the transactions. Look there.
What’s going on: I made one other assumption to explain things related to the session.save(B) call in step 4 above. That call will cause an insert statement to be issued for B at some point. My assumption was that, for some reason, the session is being flushed and this insert is happening before the inner transaction is begun in step 6. (I can’t give you any more information on this right now, as I never verified this assumption, either.)
Given these two assumptions, here’s what’s happening at the database level:
  1. Open connection 1, autocommit off
  2. Insert B on connection 1
  3. Open connection 2, autocommit off
  4. Insert A on connection 2
  5. Insert B on connection 2
  6. Commit on connection 2
  7. Commit on connection 1
Do you see the problem now? If not, you probably need one more crucial piece of information: earlier uncommitted statements against an Oracle database will block later, conflicting statements from being committed until the earlier one is committed or rolled back. More specifically, if you try to insert object B on two different connections simultaneously, the second one can’t know if it should succeed or fail until the first one is committed--remember B has unique fields! If you look back at my sequence of database events, you’ll see that I was trying to commit the second connection and then the first connection.
Tie all this together, and you have a deadlock between a single thread and the database. Connection/transaction 2 waits for connection/transaction 1 to commit or rollback, and connection/transaction 1 is unable to do either because execution is stuck waiting for connection/transaction 2 to finish doing what it’s going to do.
Nice, huh?
Solution: Make sure object B only gets associated to one session, the one tied to the inner transaction, so that it’s inserted only one time--along with object A.
Final note: This problem never cropped up in development because Derby apparently doesn’t care about database locks. It seems to blow right past many of them without batting an eye. I discovered this while experimenting with ActiveMQ using JDBC persistence in a Master/Slave failover configuration.

No comments: