Tom Davies, Crucible Developer

For this Fedex day I chose to implement HTML version diffs for Confluence.

Confluence shows the differences between versions in terms of the markup which produced each version of a page – this isn't always very clear. Often a paragraph is duplicated, with additions in one copy and deletions in the other:

paradiff.jpg

This looks better when the HTML is diffed. Or rather it would if the algorithm I chose hadn't made some unfortunate choices (I converted the python difflib to Java, because the only HTML diff I found via Google was based on it – I think I would have been better off using jrcs, the library we use now, even though it seems to be a bit of an orphan.):

htmlpara.jpg

The strategy I use is:

  1. Tokenize each version of the HTML into tags and words (so <img width=200 height=100 src="/xxx/foo.jpg"/> is a single token from the diff point of view, but <p>A Paragraph</p> is four).
  2. Run the diff algorithm and concatenate all the operations it produces – 'equal', 'changed' (which becomes an insert followed by a delete), 'added' and 'deleted'. Any text tokens get surrounded with a <span> with an appropriate class, and some other tags (like IMG) do too.
  3. Turn the HTML produced by the previous step into a DOM tree and traverse it, marking block level constructs (like <P> and <TR>) with a class to indicate that part of their contents have changed – that produces the blue lines in the margin.
  4. Replace the <a ... /> tags produced in the previous step by Neko with <a ...></a>, because the former breaks Safari and IE (at least).

There's still a lot to do:

  1. Try jrcs instead of my python difflib conversion.
  2. Apply the change anchors more sparingly, just once to each element which justifies a blue marker.
  3. Figure out what to do for lists which have just had indentation changes – these are not handled well at present.
  4. Look at more corner cases – for instance, what about a change which just changes the class of a DIV? How can we show that?

Post a comment

If you haven't left a comment here before, you may need to be approved by the site owner before your comment will appear. Until then, it won't appear on the entry. Thanks for waiting.





Remember personal info?

Type the characters you see in the picture above.