upidiff.diff
Class Diff

java.lang.Object
  |
  +--upidiff.diff.Diff

public final class Diff
extends Object

A class to compare vectors of objects. The result of comparison is a list of change objects which form an edit script. The objects compared are traditionally lines of text from two files. Comparison options such as "ignore whitespace" are implemented by modifying the equals and hashcode methods for the objects compared.

The basic algorithm is described in:
"An O(ND) Difference Algorithm and its Variations", Eugene Myers, Algorithmica Vol. 1 No. 2, 1986, p 251.

This class outputs different results from GNU diff 1.15 on some inputs. Our results are actually better (smaller change list, smaller total size of changes), but it would be nice to know why. Perhaps there is a memory overwrite bug in GNU diff 1.15.

Author:
Stuart D. Gathman, translated from GNU diff 1.15 Copyright (C) 2000 Business Management Systems, Inc.

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 1, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.


Nested Class Summary
(package private)  class Diff.Change
          The result of comparison is an "edit script": a chain of change objects.
(package private)  class Diff.FileData
          Data on one input file being compared.
 
Field Summary
private  int[] bdiag
           
private  int bdiagoff
           
private  int cost
           
private  int equiv_max
          1 more than the maximum equivalence value used for this or its sibling file.
private  int[] fdiag
           
private  int fdiagoff
           
private  Diff.FileData[] filevec
           
 boolean heuristic
          When set to true, the comparison uses a heuristic to speed it up.
private  boolean inhibit
           
 boolean no_discards
          When set to true, the algorithm returns a guarranteed minimal set of changes.
private  int[] xvec
           
private  int[] yvec
           
 
Constructor Summary
Diff(Object[] a, Object[] b)
          Prepare to find differences between two arrays.
 
Method Summary
private  Diff.Change build_reverse_script()
          Scan the tables of which lines are inserted and deleted, producing an edit script in reverse order.
private  Diff.Change build_script()
          Scan the tables of which lines are inserted and deleted, producing an edit script in forward order.
 Diff.Change calculateDiff()
           
private  void compareseq(int xoff, int xlim, int yoff, int ylim)
          Compare in detail contiguous subsequences of the two files which are known, as a whole, to match each other.
private  int diag(int xoff, int xlim, int yoff, int ylim)
          Find the midpoint of the shortest edit script for a specified portion of the two files.
private  void discard_confusing_lines()
          Discard lines from one file that have no matches in the other file.
private  void shift_boundaries()
          Adjust inserts/deletes of blank lines to join changes as much as possible.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

equiv_max

private int equiv_max
1 more than the maximum equivalence value used for this or its sibling file.


heuristic

public boolean heuristic
When set to true, the comparison uses a heuristic to speed it up. With this heuristic, for files with a constant small density of changes, the algorithm is linear in the file size.


no_discards

public boolean no_discards
When set to true, the algorithm returns a guarranteed minimal set of changes. This makes things slower, sometimes much slower.


xvec

private int[] xvec

yvec

private int[] yvec

fdiag

private int[] fdiag

bdiag

private int[] bdiag

fdiagoff

private int fdiagoff

bdiagoff

private int bdiagoff

filevec

private final Diff.FileData[] filevec

cost

private int cost

inhibit

private boolean inhibit
Constructor Detail

Diff

public Diff(Object[] a,
            Object[] b)
Prepare to find differences between two arrays. Each element of the arrays is translated to an "equivalence number" based on the result of equals. The original Object arrays are no longer needed for computing the differences. They will be needed again later to print the results of the comparison as an edit script, if desired.

Method Detail

diag

private int diag(int xoff,
                 int xlim,
                 int yoff,
                 int ylim)
Find the midpoint of the shortest edit script for a specified portion of the two files. We scan from the beginnings of the files, and simultaneously from the ends, doing a breadth-first search through the space of edit-sequence. When the two searches meet, we have found the midpoint of the shortest edit sequence. The value returned is the number of the diagonal on which the midpoint lies. The diagonal number equals the number of inserted lines minus the number of deleted lines (counting only lines before the midpoint). The edit cost is stored into COST; this is the total number of lines inserted or deleted (counting only lines before the midpoint). This function assumes that the first lines of the specified portions of the two files do not match, and likewise that the last lines do not match. The caller must trim matching lines from the beginning and end of the portions it is going to specify. Note that if we return the "wrong" diagonal value, or if the value of bdiag at that diagonal is "wrong", the worst this can do is cause suboptimal diff output. It cannot cause incorrect diff output.


compareseq

private void compareseq(int xoff,
                        int xlim,
                        int yoff,
                        int ylim)
Compare in detail contiguous subsequences of the two files which are known, as a whole, to match each other. The results are recorded in the vectors filevec[N].changed_flag, by storing a 1 in the element for each line that is an insertion or deletion. The subsequence of file 0 is [XOFF, XLIM) and likewise for file 1. Note that XLIM, YLIM are exclusive bounds. All line numbers are origin-0 and discarded lines are not counted.


discard_confusing_lines

private void discard_confusing_lines()
Discard lines from one file that have no matches in the other file.


shift_boundaries

private void shift_boundaries()
Adjust inserts/deletes of blank lines to join changes as much as possible.


build_reverse_script

private Diff.Change build_reverse_script()
Scan the tables of which lines are inserted and deleted, producing an edit script in reverse order.


build_script

private Diff.Change build_script()
Scan the tables of which lines are inserted and deleted, producing an edit script in forward order.


calculateDiff

public Diff.Change calculateDiff()