We've been having a fascinating discussion on the OpenSymphony administrators mailing list. It all started with James expressing his frustration about the state of open source Enterprise Java application servers:
(When he says Enterprise, he means Enterprise - not toy)
Seriously, (putting it a bit more back on topic) is there any clustered configuration with support for high-volume XA-transactions that you trust to deploy your WebWork, Hibernate, OSWorkflow, Quartz, and Sitmesh based apps on, that doesn't cost an arm and a leg? Seems to me that the Java community is seriously lacking any reasonably cheap offering that includes a solid TM in its stack. Is there an option I haven't thought of, or are all cheap JEE options truly missing XA support? Is there a hole to fill here (a true fully implemented TM)?
After much to-ing and fro-ing, it was decided that yes - if you need 2 phase commit, XA across JDBC and external JMS, fast performance, session + server clustering and XA with recovery - you're going to be shelling out for an expensive app server (really WebLogic or WebSphere).
My contention that the reason the Open Source community doesn't provide this yet is quite simple.
Most people don't need any of that to build their Java/J2EE applications. When I say most, I mean the 98% of people who aren't banks and although they understand 2PC have never actually needed it in their lives. The other 2% are probably banks and pay for WebLogic or WebSphere as a part of an 'enterprise software license agreement' all-you-can-eat-buffet deal anyway.
James' contention was:
I believe you're mostly right, however I talk to developers (or read forum messages from developers) all the time who either:
- Don't understand that they need XA to use JDBC + JMS at the same time,
- Don't understand that they need XA to use two JDBC connections at the same time,
- Do understand this, but use an app server with "faked" XA, without knowing it,
- Do understand this, and think that Spring gives it to them (not realizing Spring just delegates to the possibly broken TM)
Now I don't disagree with him. XA is something I think very few J2EE developers really understand when you push them on it, however I'd like to answer these points one by one.
Like I said, 98% of developers:
Are you in the 98%?
It's nice to know that as your project grows, J2EE stays right there with you!
rebelutionary: 98% of Java developers
An interesting comment on whether or not you actually need the whole J2EE stack. rebelutionary: 98% of Java developers “just don’t need it” But then, where do you put people who actually need it but work in a company that has choos...
I'm in the 2% and for what it's worth, XA doesn't work on the big-mony app servers either.
I mostly agree with you most often the answer you give with your 4 points seems reasonable, but in my experience using more than one DB is really not that uncommon, for example my current project we use 4 different DB’s at this time (these are all considered “legacy� although normal Oracle RDBMS) and in quite some cases we need connections to our application DB as well as a “legacy� db at the same time to do some matching, or other operation.
Are you aware that JBoss purchased Arjuna and will be releasing their high performance transaction manager as open source? One more victory for open source.
Even if you have Weblogic, 2PC is not available if you are using MQSeries in a distributed environment, since it does not support it except in "binding" mode.
I think you under-estimate point 2 a little, it isn't two databases, but two physical connections. If you use a connection pool of some sort and your application generally gets a connection from the pool uses it then closes it (a reasonably common practice), then your pool either has to guarantee that the same underlying connection is used by all code participating in the transaction or you cannot *guarantee* TX consistency. While it is pretty rare for a database to fail at commit, it does sometimes happen.
I generally agree with you though I'm not sure 98% is the number. I think its one of those 'programming on the small' (writing a small stand alone web app that just uses a database) versus 'programming on the large' - writing one service/application within an enterprise which is integrated with other parts of the business.
If you need to communicate with another system developed by another team/company using JMS then the chances are you need something like XA or an alternative - as by definition there will be multiple databases and the JMS to work with.
BTW here's an alternative which does't require XA...
http://activemq.org/Should+I+use+XA
FWIW Geronimo has a TM which implements recovery. ActiveMQ supports XA and recovery - as does the TranQL JCA ResourceAdapter for JDBC - so if you're looking for open source XA + JMS + JDBC, I"d recommend that bundle.
But like anything - test your configuration in your environment - *especially* recovery - to make sure it works for you.
BTW I"ve had similar conversations with some customers; they have in the past used HTTP instead of JMS. When I ask, how do you avoid duplicates and reliability?
They say - oh if a HTTP operation fails, we try it again - then we check the database every night for duplicate records and delete 'em...
I would assume that IF you need JMS, you most probably need Transactions spanning JDBC and JMS.
I have always used JMS in conjunction with JDBC, as in: Do some work, update the DB, and send a message.
Now this would always need to happend in a transaction. Why would you send the message if the wor was not done?
At the risk of sounding arrogant, if you only want to have a web frontend to a db, you would be better off using Ruby on Rails
I'm a little conservative by nature in the sense that I don't like introducing a new technology (e.g. XA) into the mix unless I'm sure I *absolutely* need it. To reach this conclusion, I guess I need to *absolutely* understand it (exageration perhaps :-)).
Transaction Management is something I've never really felt comfortable with for all the reasons outlined here - what will and will not the different implementations really give me etc. For this reason, I've always (rightly or wrongly) sorta side-stepped it for fear that it will only replace one problem with a different problem that I understand even less.
What might be useful would be an unbiased "XA/TX use-case to Vendor-Support Matrix" of some sort that can help poor people like myself understand when and where I can use XA implementations, and where my Weblogic/Websphere etc will actualy be doing something for me - other than possibly creating a bigger problem for me.
Norman,
For what it's worth I pointed that out in the mail thread. James said there wasn't a stable build out yet of JBoss + opensource Arjuna, but that he'd keep an eye on it.
I also suggested Geronimo / Jencks since my investigations showed that they've implemented recovery in the TM and that JDBC + ActiveMQ XA transactions are supported. James (rightly) pointed out that the Geronimo stack is still pretty new and therefor dangerous.
BTW Using XA transactions to solve the "do some work in the database, then send a message to signify that work was done" model is broken. In other words the sequence
1> OPEN XA Transaction.
2> WRITE to Database.
3> WRITE message to Queue (signifiying write to database.)
4> COMMIT XA Transaction
causes a race condition that will/can bite you at some point in time. As I understand it there is no guarantee that the actual commit to the database will happen "at the same time" or before the commit to the queue. Consequently, it is possible for a message receiver to receive the message from the queue, then check the database and discover that the record the message is referring to isn't there. This is because the XA protocol only "guarantees" that all participants in the transaction will eventually persist their load, but it makes no promises about the order or even when the respective persists will occur.
In my opinion, most times developers decide to use XA, it is to solve a problem that XA isn't really suited to solve, and they would do alot better to forget XA altogether and focus on changing their design instead. This is not always possible, but 9 times out of 10, when I see XA employed, it is unnecessary, and may indeed actually be harmful.
BTW Using XA transactions to solve the "do some work in the database, then send a message to signify that work was done" model is broken. In other words the sequence
1> OPEN XA Transaction.
2> WRITE to Database.
3> WRITE message to Queue (signifiying write to database.)
4> COMMIT XA Transaction
causes a race condition that will/can bite you at some point in time. As I understand it there is no guarantee that the actual commit to the database will happen "at the same time" or before the commit to the queue. Consequently, it is possible for a message receiver to receive the message from the queue, then check the database and discover that the record the message is referring to isn't there. This is because the XA protocol only "guarantees" that all participants in the transaction will eventually persist their load, but it makes no promises about the order or even when the respective persists will occur.
In my opinion, most times developers decide to use XA, it is to solve a problem that XA isn't really suited to solve, and they would do alot better to forget XA altogether and focus on changing their design instead. This is not always possible, but 9 times out of 10, when I see XA employed, it is unnecessary, and may indeed actually be harmful.
I know XA quite well. I still use JBoss so when I do need features such as XA in the future I can move up to a "real" application server. You get what you pay for.
Actually mike and Shai with the acquisition of Arjuna -- JBAS has excellent XA support (granted this is going to be out in the 4.0.4 release). I question whether it is needed most of the time. I also question whether most developer "get" local transactions even (I can tell you from experience many do not). forget 2pc.
There are many apps that need to do JMS and DB in the same transaction. Though granted it isn't a majority and much of the time if you just back your JMS with the same DB you can fake it anyhow.
I think you'll also find that XA support in most databases (including the ones that you would think otherwise) is not what you might hope it to be. You'll even find that the XA standard has HUGE gaps that make every deployment vendor specific.
I'm also shocked to hear that Spring and maybe even Ruby On Rails do not automatically solve all such worries for me. I was hoping to retire early.
So in short, you can not have your XA with your JBoss too, XA isn't everything, XA is not a first resort and holy crap Spring and RoR don't solve the need for this???
I reckon James is wrong. It is almost always possible to coordinate a JDBC and a JMS connection without transactions.
For example, take a program that removes messages from a queue and places the data in a database. We did this recently and used the following algorithm:
1. Take the next message off the queue
2. If the message data is already in the database, commit the JMS connection, and go back to step 1.
3. If the message data isn't in the database, insert it, and commit the JDBC connection
4. Commit the JMS connection
5. Go back to step 1.
A failure between steps 3 and 4 would cause us to take the same message off the queue twice. However, because it checks whether the message is already in the database before inserting it, the algorithm is robust against taking the same message more than once.
For our app, the overhead of the per-message database read more than made up for the cost and complexity of XA (not to mention the need for some poor bunny to have to check for and manually resolve in-doubt transactions). There are several ways the algorithm could be modified to reduce this overhead.
> Have never considered an application might have two databases
I had an antibody order tracking and reporting application which used four databases.
Another clone and peptide order tracking and reporting application used two.
I think you are being overly sensationalistic.
Alan -
This works in some scenarios - particular subsets of the situation where you are reading from a queue, and inserting into a db. But even then it isn't bullet proof (what if some other part of the process finds the record in the db, acts on it, and deletes the row, before the message is found again in the queue (think after a failure scenario here).
Additionally, if you're scenario is flipped: find a record in the db, send a jms message, remove/update the record, how can you be sure to only update/remove the record if the message sent, without also causing the possibility of a duplicate message send? There are some *very* complex ways to deal with this without XA, but you end up creating extra queues, or using topics instead of queues, and you eventually end up having your whole application architecture warped by the need to work-around the lack of XA.
Response to Brian's post about MQ:
Not true. WebSphere MQ have transactional client which works over TCP and doesn't not require "binding mode" for quite some time now - at least 3 or more years.
I concur with Alan on this. If you only consume messages from queue and update _one_ database (quite common scenario) you don't need XA. All you need is mechanism for duplicate delivery detection. What works best is table with unique index by JMSMessageId and process withc updates database (in the same connection) just tries to insert new row on every message, which going to fail in case if it's duplicate delivery. You can find more details in our article published on dev2dev: http://dev2dev.bea.com/pub/a/2006/01/custom-mdb-processing.html
Apparently being one of the few that has done multiple projects with atleast 2 datasources for one application (over a cluster of app. servers, either round robin with sticky sessions or a true cluster), I can honestly say that XA is the most buggy transaction mechanism ever.
My prefered platform for enterprise is still Weblogic with Oracle (preferably in RAC setup when performance and failover is needed), though the issues with the XA support have costed me more time (and thus the clients money) then I want to count.
Though I have to agree that most projects out there are single datasource, non-XA and simple CRUD, I would love to see the vendors pick up the XA support. If the opensource community will be able to provide good XA support I'd be glad, but if a commercial vendor with good support and SLA can pick it up it's got my blessing (and a new client ;-).
I'm in the 2% that I actually believe is more like 10%.
``f you need 2 phase commit, XA across JDBC and external JMS, fast performance, session + server clustering and XA with recovery - you're going to be shelling out for an expensive app server (really WebLogic or WebSphere).
My contention that the reason the Open Source community doesn't provide this yet is quite simple.
Most people don't need any of that to build their Java/J2EE applications. ``
Right. And here I thought it was because it's extermely difficult to write the type of software your describing especially getting it right. It takes years of effort which IBM & BEA have invested.
I'm all for open source software but you sound like the Linux guys 5 years ago when they explained why Linux didn't handle multitple CPU's (SMP) > 2, "Who needs that?"
As one who promotes working asynchronously as much as possible, I have yet to need XA. There is nearly always a way to have a good design for transactions without needing 2PC or XA.