Akshayram1 commited on
Commit
24d15a4
·
verified ·
1 Parent(s): db99747

Upload diff_match_patch.py

Browse files
Files changed (1) hide show
  1. diff_match_patch.py +1907 -0
diff_match_patch.py ADDED
@@ -0,0 +1,1907 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/python3
2
+
3
+ """Diff Match and Patch
4
+ Copyright 2018 The diff-match-patch Authors.
5
+ https://github.com/google/diff-match-patch
6
+
7
+ Licensed under the Apache License, Version 2.0 (the "License");
8
+ you may not use this file except in compliance with the License.
9
+ You may obtain a copy of the License at
10
+
11
+ http://www.apache.org/licenses/LICENSE-2.0
12
+
13
+ Unless required by applicable law or agreed to in writing, software
14
+ distributed under the License is distributed on an "AS IS" BASIS,
15
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
16
+ See the License for the specific language governing permissions and
17
+ limitations under the License.
18
+ """
19
+
20
+ """Functions for diff, match and patch.
21
+
22
+ Computes the difference between two texts to create a patch.
23
+ Applies the patch onto another text, allowing for errors.
24
+ """
25
+
26
+ __author__ = '[email protected] (Neil Fraser)'
27
+
28
+ import re
29
+ import sys
30
+ import time
31
+ import urllib.parse
32
+
33
+
34
+ class diff_match_patch:
35
+ """Class containing the diff, match and patch methods.
36
+
37
+ Also contains the behaviour settings.
38
+ """
39
+
40
+ def __init__(self):
41
+ """Inits a diff_match_patch object with default settings.
42
+ Redefine these in your program to override the defaults.
43
+ """
44
+
45
+ # Number of seconds to map a diff before giving up (0 for infinity).
46
+ self.Diff_Timeout = 1.0
47
+ # Cost of an empty edit operation in terms of edit characters.
48
+ self.Diff_EditCost = 4
49
+ # At what point is no match declared (0.0 = perfection, 1.0 = very loose).
50
+ self.Match_Threshold = 0.5
51
+ # How far to search for a match (0 = exact location, 1000+ = broad match).
52
+ # A match this many characters away from the expected location will add
53
+ # 1.0 to the score (0.0 is a perfect match).
54
+ self.Match_Distance = 1000
55
+ # When deleting a large block of text (over ~64 characters), how close do
56
+ # the contents have to be to match the expected contents. (0.0 = perfection,
57
+ # 1.0 = very loose). Note that Match_Threshold controls how closely the
58
+ # end points of a delete need to match.
59
+ self.Patch_DeleteThreshold = 0.5
60
+ # Chunk size for context length.
61
+ self.Patch_Margin = 4
62
+
63
+ # The number of bits in an int.
64
+ # Python has no maximum, thus to disable patch splitting set to 0.
65
+ # However to avoid long patches in certain pathological cases, use 32.
66
+ # Multiple short patches (using native ints) are much faster than long ones.
67
+ self.Match_MaxBits = 32
68
+
69
+ # DIFF FUNCTIONS
70
+
71
+ # The data structure representing a diff is an array of tuples:
72
+ # [(DIFF_DELETE, "Hello"), (DIFF_INSERT, "Goodbye"), (DIFF_EQUAL, " world.")]
73
+ # which means: delete "Hello", add "Goodbye" and keep " world."
74
+ DIFF_DELETE = -1
75
+ DIFF_INSERT = 1
76
+ DIFF_EQUAL = 0
77
+
78
+ def diff_main(self, text1, text2, checklines=True, deadline=None):
79
+ """Find the differences between two texts. Simplifies the problem by
80
+ stripping any common prefix or suffix off the texts before diffing.
81
+
82
+ Args:
83
+ text1: Old string to be diffed.
84
+ text2: New string to be diffed.
85
+ checklines: Optional speedup flag. If present and false, then don't run
86
+ a line-level diff first to identify the changed areas.
87
+ Defaults to true, which does a faster, slightly less optimal diff.
88
+ deadline: Optional time when the diff should be complete by. Used
89
+ internally for recursive calls. Users should set DiffTimeout instead.
90
+
91
+ Returns:
92
+ Array of changes.
93
+ """
94
+ # Set a deadline by which time the diff must be complete.
95
+ if deadline == None:
96
+ # Unlike in most languages, Python counts time in seconds.
97
+ if self.Diff_Timeout <= 0:
98
+ deadline = sys.maxsize
99
+ else:
100
+ deadline = time.time() + self.Diff_Timeout
101
+
102
+ # Check for null inputs.
103
+ if text1 == None or text2 == None:
104
+ raise ValueError("Null inputs. (diff_main)")
105
+
106
+ # Check for equality (speedup).
107
+ if text1 == text2:
108
+ if text1:
109
+ return [(self.DIFF_EQUAL, text1)]
110
+ return []
111
+
112
+ # Trim off common prefix (speedup).
113
+ commonlength = self.diff_commonPrefix(text1, text2)
114
+ commonprefix = text1[:commonlength]
115
+ text1 = text1[commonlength:]
116
+ text2 = text2[commonlength:]
117
+
118
+ # Trim off common suffix (speedup).
119
+ commonlength = self.diff_commonSuffix(text1, text2)
120
+ if commonlength == 0:
121
+ commonsuffix = ''
122
+ else:
123
+ commonsuffix = text1[-commonlength:]
124
+ text1 = text1[:-commonlength]
125
+ text2 = text2[:-commonlength]
126
+
127
+ # Compute the diff on the middle block.
128
+ diffs = self.diff_compute(text1, text2, checklines, deadline)
129
+
130
+ # Restore the prefix and suffix.
131
+ if commonprefix:
132
+ diffs[:0] = [(self.DIFF_EQUAL, commonprefix)]
133
+ if commonsuffix:
134
+ diffs.append((self.DIFF_EQUAL, commonsuffix))
135
+ self.diff_cleanupMerge(diffs)
136
+ return diffs
137
+
138
+ def diff_compute(self, text1, text2, checklines, deadline):
139
+ """Find the differences between two texts. Assumes that the texts do not
140
+ have any common prefix or suffix.
141
+
142
+ Args:
143
+ text1: Old string to be diffed.
144
+ text2: New string to be diffed.
145
+ checklines: Speedup flag. If false, then don't run a line-level diff
146
+ first to identify the changed areas.
147
+ If true, then run a faster, slightly less optimal diff.
148
+ deadline: Time when the diff should be complete by.
149
+
150
+ Returns:
151
+ Array of changes.
152
+ """
153
+ if not text1:
154
+ # Just add some text (speedup).
155
+ return [(self.DIFF_INSERT, text2)]
156
+
157
+ if not text2:
158
+ # Just delete some text (speedup).
159
+ return [(self.DIFF_DELETE, text1)]
160
+
161
+ if len(text1) > len(text2):
162
+ (longtext, shorttext) = (text1, text2)
163
+ else:
164
+ (shorttext, longtext) = (text1, text2)
165
+ i = longtext.find(shorttext)
166
+ if i != -1:
167
+ # Shorter text is inside the longer text (speedup).
168
+ diffs = [(self.DIFF_INSERT, longtext[:i]), (self.DIFF_EQUAL, shorttext),
169
+ (self.DIFF_INSERT, longtext[i + len(shorttext):])]
170
+ # Swap insertions for deletions if diff is reversed.
171
+ if len(text1) > len(text2):
172
+ diffs[0] = (self.DIFF_DELETE, diffs[0][1])
173
+ diffs[2] = (self.DIFF_DELETE, diffs[2][1])
174
+ return diffs
175
+
176
+ if len(shorttext) == 1:
177
+ # Single character string.
178
+ # After the previous speedup, the character can't be an equality.
179
+ return [(self.DIFF_DELETE, text1), (self.DIFF_INSERT, text2)]
180
+
181
+ # Check to see if the problem can be split in two.
182
+ hm = self.diff_halfMatch(text1, text2)
183
+ if hm:
184
+ # A half-match was found, sort out the return data.
185
+ (text1_a, text1_b, text2_a, text2_b, mid_common) = hm
186
+ # Send both pairs off for separate processing.
187
+ diffs_a = self.diff_main(text1_a, text2_a, checklines, deadline)
188
+ diffs_b = self.diff_main(text1_b, text2_b, checklines, deadline)
189
+ # Merge the results.
190
+ return diffs_a + [(self.DIFF_EQUAL, mid_common)] + diffs_b
191
+
192
+ if checklines and len(text1) > 100 and len(text2) > 100:
193
+ return self.diff_lineMode(text1, text2, deadline)
194
+
195
+ return self.diff_bisect(text1, text2, deadline)
196
+
197
+ def diff_lineMode(self, text1, text2, deadline):
198
+ """Do a quick line-level diff on both strings, then rediff the parts for
199
+ greater accuracy.
200
+ This speedup can produce non-minimal diffs.
201
+
202
+ Args:
203
+ text1: Old string to be diffed.
204
+ text2: New string to be diffed.
205
+ deadline: Time when the diff should be complete by.
206
+
207
+ Returns:
208
+ Array of changes.
209
+ """
210
+
211
+ # Scan the text on a line-by-line basis first.
212
+ (text1, text2, linearray) = self.diff_linesToChars(text1, text2)
213
+
214
+ diffs = self.diff_main(text1, text2, False, deadline)
215
+
216
+ # Convert the diff back to original text.
217
+ self.diff_charsToLines(diffs, linearray)
218
+ # Eliminate freak matches (e.g. blank lines)
219
+ self.diff_cleanupSemantic(diffs)
220
+
221
+ # Rediff any replacement blocks, this time character-by-character.
222
+ # Add a dummy entry at the end.
223
+ diffs.append((self.DIFF_EQUAL, ''))
224
+ pointer = 0
225
+ count_delete = 0
226
+ count_insert = 0
227
+ text_delete = ''
228
+ text_insert = ''
229
+ while pointer < len(diffs):
230
+ if diffs[pointer][0] == self.DIFF_INSERT:
231
+ count_insert += 1
232
+ text_insert += diffs[pointer][1]
233
+ elif diffs[pointer][0] == self.DIFF_DELETE:
234
+ count_delete += 1
235
+ text_delete += diffs[pointer][1]
236
+ elif diffs[pointer][0] == self.DIFF_EQUAL:
237
+ # Upon reaching an equality, check for prior redundancies.
238
+ if count_delete >= 1 and count_insert >= 1:
239
+ # Delete the offending records and add the merged ones.
240
+ subDiff = self.diff_main(text_delete, text_insert, False, deadline)
241
+ diffs[pointer - count_delete - count_insert : pointer] = subDiff
242
+ pointer = pointer - count_delete - count_insert + len(subDiff)
243
+ count_insert = 0
244
+ count_delete = 0
245
+ text_delete = ''
246
+ text_insert = ''
247
+
248
+ pointer += 1
249
+
250
+ diffs.pop() # Remove the dummy entry at the end.
251
+
252
+ return diffs
253
+
254
+ def diff_bisect(self, text1, text2, deadline):
255
+ """Find the 'middle snake' of a diff, split the problem in two
256
+ and return the recursively constructed diff.
257
+ See Myers 1986 paper: An O(ND) Difference Algorithm and Its Variations.
258
+
259
+ Args:
260
+ text1: Old string to be diffed.
261
+ text2: New string to be diffed.
262
+ deadline: Time at which to bail if not yet complete.
263
+
264
+ Returns:
265
+ Array of diff tuples.
266
+ """
267
+
268
+ # Cache the text lengths to prevent multiple calls.
269
+ text1_length = len(text1)
270
+ text2_length = len(text2)
271
+ max_d = (text1_length + text2_length + 1) // 2
272
+ v_offset = max_d
273
+ v_length = 2 * max_d
274
+ v1 = [-1] * v_length
275
+ v1[v_offset + 1] = 0
276
+ v2 = v1[:]
277
+ delta = text1_length - text2_length
278
+ # If the total number of characters is odd, then the front path will
279
+ # collide with the reverse path.
280
+ front = (delta % 2 != 0)
281
+ # Offsets for start and end of k loop.
282
+ # Prevents mapping of space beyond the grid.
283
+ k1start = 0
284
+ k1end = 0
285
+ k2start = 0
286
+ k2end = 0
287
+ for d in range(max_d):
288
+ # Bail out if deadline is reached.
289
+ if time.time() > deadline:
290
+ break
291
+
292
+ # Walk the front path one step.
293
+ for k1 in range(-d + k1start, d + 1 - k1end, 2):
294
+ k1_offset = v_offset + k1
295
+ if k1 == -d or (k1 != d and
296
+ v1[k1_offset - 1] < v1[k1_offset + 1]):
297
+ x1 = v1[k1_offset + 1]
298
+ else:
299
+ x1 = v1[k1_offset - 1] + 1
300
+ y1 = x1 - k1
301
+ while (x1 < text1_length and y1 < text2_length and
302
+ text1[x1] == text2[y1]):
303
+ x1 += 1
304
+ y1 += 1
305
+ v1[k1_offset] = x1
306
+ if x1 > text1_length:
307
+ # Ran off the right of the graph.
308
+ k1end += 2
309
+ elif y1 > text2_length:
310
+ # Ran off the bottom of the graph.
311
+ k1start += 2
312
+ elif front:
313
+ k2_offset = v_offset + delta - k1
314
+ if k2_offset >= 0 and k2_offset < v_length and v2[k2_offset] != -1:
315
+ # Mirror x2 onto top-left coordinate system.
316
+ x2 = text1_length - v2[k2_offset]
317
+ if x1 >= x2:
318
+ # Overlap detected.
319
+ return self.diff_bisectSplit(text1, text2, x1, y1, deadline)
320
+
321
+ # Walk the reverse path one step.
322
+ for k2 in range(-d + k2start, d + 1 - k2end, 2):
323
+ k2_offset = v_offset + k2
324
+ if k2 == -d or (k2 != d and
325
+ v2[k2_offset - 1] < v2[k2_offset + 1]):
326
+ x2 = v2[k2_offset + 1]
327
+ else:
328
+ x2 = v2[k2_offset - 1] + 1
329
+ y2 = x2 - k2
330
+ while (x2 < text1_length and y2 < text2_length and
331
+ text1[-x2 - 1] == text2[-y2 - 1]):
332
+ x2 += 1
333
+ y2 += 1
334
+ v2[k2_offset] = x2
335
+ if x2 > text1_length:
336
+ # Ran off the left of the graph.
337
+ k2end += 2
338
+ elif y2 > text2_length:
339
+ # Ran off the top of the graph.
340
+ k2start += 2
341
+ elif not front:
342
+ k1_offset = v_offset + delta - k2
343
+ if k1_offset >= 0 and k1_offset < v_length and v1[k1_offset] != -1:
344
+ x1 = v1[k1_offset]
345
+ y1 = v_offset + x1 - k1_offset
346
+ # Mirror x2 onto top-left coordinate system.
347
+ x2 = text1_length - x2
348
+ if x1 >= x2:
349
+ # Overlap detected.
350
+ return self.diff_bisectSplit(text1, text2, x1, y1, deadline)
351
+
352
+ # Diff took too long and hit the deadline or
353
+ # number of diffs equals number of characters, no commonality at all.
354
+ return [(self.DIFF_DELETE, text1), (self.DIFF_INSERT, text2)]
355
+
356
+ def diff_bisectSplit(self, text1, text2, x, y, deadline):
357
+ """Given the location of the 'middle snake', split the diff in two parts
358
+ and recurse.
359
+
360
+ Args:
361
+ text1: Old string to be diffed.
362
+ text2: New string to be diffed.
363
+ x: Index of split point in text1.
364
+ y: Index of split point in text2.
365
+ deadline: Time at which to bail if not yet complete.
366
+
367
+ Returns:
368
+ Array of diff tuples.
369
+ """
370
+ text1a = text1[:x]
371
+ text2a = text2[:y]
372
+ text1b = text1[x:]
373
+ text2b = text2[y:]
374
+
375
+ # Compute both diffs serially.
376
+ diffs = self.diff_main(text1a, text2a, False, deadline)
377
+ diffsb = self.diff_main(text1b, text2b, False, deadline)
378
+
379
+ return diffs + diffsb
380
+
381
+ def diff_linesToChars(self, text1, text2):
382
+ """Split two texts into an array of strings. Reduce the texts to a string
383
+ of hashes where each Unicode character represents one line.
384
+
385
+ Args:
386
+ text1: First string.
387
+ text2: Second string.
388
+
389
+ Returns:
390
+ Three element tuple, containing the encoded text1, the encoded text2 and
391
+ the array of unique strings. The zeroth element of the array of unique
392
+ strings is intentionally blank.
393
+ """
394
+ lineArray = [] # e.g. lineArray[4] == "Hello\n"
395
+ lineHash = {} # e.g. lineHash["Hello\n"] == 4
396
+
397
+ # "\x00" is a valid character, but various debuggers don't like it.
398
+ # So we'll insert a junk entry to avoid generating a null character.
399
+ lineArray.append('')
400
+
401
+ def diff_linesToCharsMunge(text):
402
+ """Split a text into an array of strings. Reduce the texts to a string
403
+ of hashes where each Unicode character represents one line.
404
+ Modifies linearray and linehash through being a closure.
405
+
406
+ Args:
407
+ text: String to encode.
408
+
409
+ Returns:
410
+ Encoded string.
411
+ """
412
+ chars = []
413
+ # Walk the text, pulling out a substring for each line.
414
+ # text.split('\n') would would temporarily double our memory footprint.
415
+ # Modifying text would create many large strings to garbage collect.
416
+ lineStart = 0
417
+ lineEnd = -1
418
+ while lineEnd < len(text) - 1:
419
+ lineEnd = text.find('\n', lineStart)
420
+ if lineEnd == -1:
421
+ lineEnd = len(text) - 1
422
+ line = text[lineStart:lineEnd + 1]
423
+
424
+ if line in lineHash:
425
+ chars.append(chr(lineHash[line]))
426
+ else:
427
+ if len(lineArray) == maxLines:
428
+ # Bail out at 1114111 because chr(1114112) throws.
429
+ line = text[lineStart:]
430
+ lineEnd = len(text)
431
+ lineArray.append(line)
432
+ lineHash[line] = len(lineArray) - 1
433
+ chars.append(chr(len(lineArray) - 1))
434
+ lineStart = lineEnd + 1
435
+ return "".join(chars)
436
+
437
+ # Allocate 2/3rds of the space for text1, the rest for text2.
438
+ maxLines = 666666
439
+ chars1 = diff_linesToCharsMunge(text1)
440
+ maxLines = 1114111
441
+ chars2 = diff_linesToCharsMunge(text2)
442
+ return (chars1, chars2, lineArray)
443
+
444
+ def diff_charsToLines(self, diffs, lineArray):
445
+ """Rehydrate the text in a diff from a string of line hashes to real lines
446
+ of text.
447
+
448
+ Args:
449
+ diffs: Array of diff tuples.
450
+ lineArray: Array of unique strings.
451
+ """
452
+ for i in range(len(diffs)):
453
+ text = []
454
+ for char in diffs[i][1]:
455
+ text.append(lineArray[ord(char)])
456
+ diffs[i] = (diffs[i][0], "".join(text))
457
+
458
+ def diff_commonPrefix(self, text1, text2):
459
+ """Determine the common prefix of two strings.
460
+
461
+ Args:
462
+ text1: First string.
463
+ text2: Second string.
464
+
465
+ Returns:
466
+ The number of characters common to the start of each string.
467
+ """
468
+ # Quick check for common null cases.
469
+ if not text1 or not text2 or text1[0] != text2[0]:
470
+ return 0
471
+ # Binary search.
472
+ # Performance analysis: https://neil.fraser.name/news/2007/10/09/
473
+ pointermin = 0
474
+ pointermax = min(len(text1), len(text2))
475
+ pointermid = pointermax
476
+ pointerstart = 0
477
+ while pointermin < pointermid:
478
+ if text1[pointerstart:pointermid] == text2[pointerstart:pointermid]:
479
+ pointermin = pointermid
480
+ pointerstart = pointermin
481
+ else:
482
+ pointermax = pointermid
483
+ pointermid = (pointermax - pointermin) // 2 + pointermin
484
+ return pointermid
485
+
486
+ def diff_commonSuffix(self, text1, text2):
487
+ """Determine the common suffix of two strings.
488
+
489
+ Args:
490
+ text1: First string.
491
+ text2: Second string.
492
+
493
+ Returns:
494
+ The number of characters common to the end of each string.
495
+ """
496
+ # Quick check for common null cases.
497
+ if not text1 or not text2 or text1[-1] != text2[-1]:
498
+ return 0
499
+ # Binary search.
500
+ # Performance analysis: https://neil.fraser.name/news/2007/10/09/
501
+ pointermin = 0
502
+ pointermax = min(len(text1), len(text2))
503
+ pointermid = pointermax
504
+ pointerend = 0
505
+ while pointermin < pointermid:
506
+ if (text1[-pointermid:len(text1) - pointerend] ==
507
+ text2[-pointermid:len(text2) - pointerend]):
508
+ pointermin = pointermid
509
+ pointerend = pointermin
510
+ else:
511
+ pointermax = pointermid
512
+ pointermid = (pointermax - pointermin) // 2 + pointermin
513
+ return pointermid
514
+
515
+ def diff_commonOverlap(self, text1, text2):
516
+ """Determine if the suffix of one string is the prefix of another.
517
+
518
+ Args:
519
+ text1 First string.
520
+ text2 Second string.
521
+
522
+ Returns:
523
+ The number of characters common to the end of the first
524
+ string and the start of the second string.
525
+ """
526
+ # Cache the text lengths to prevent multiple calls.
527
+ text1_length = len(text1)
528
+ text2_length = len(text2)
529
+ # Eliminate the null case.
530
+ if text1_length == 0 or text2_length == 0:
531
+ return 0
532
+ # Truncate the longer string.
533
+ if text1_length > text2_length:
534
+ text1 = text1[-text2_length:]
535
+ elif text1_length < text2_length:
536
+ text2 = text2[:text1_length]
537
+ text_length = min(text1_length, text2_length)
538
+ # Quick check for the worst case.
539
+ if text1 == text2:
540
+ return text_length
541
+
542
+ # Start by looking for a single character match
543
+ # and increase length until no match is found.
544
+ # Performance analysis: https://neil.fraser.name/news/2010/11/04/
545
+ best = 0
546
+ length = 1
547
+ while True:
548
+ pattern = text1[-length:]
549
+ found = text2.find(pattern)
550
+ if found == -1:
551
+ return best
552
+ length += found
553
+ if found == 0 or text1[-length:] == text2[:length]:
554
+ best = length
555
+ length += 1
556
+
557
+ def diff_halfMatch(self, text1, text2):
558
+ """Do the two texts share a substring which is at least half the length of
559
+ the longer text?
560
+ This speedup can produce non-minimal diffs.
561
+
562
+ Args:
563
+ text1: First string.
564
+ text2: Second string.
565
+
566
+ Returns:
567
+ Five element Array, containing the prefix of text1, the suffix of text1,
568
+ the prefix of text2, the suffix of text2 and the common middle. Or None
569
+ if there was no match.
570
+ """
571
+ if self.Diff_Timeout <= 0:
572
+ # Don't risk returning a non-optimal diff if we have unlimited time.
573
+ return None
574
+ if len(text1) > len(text2):
575
+ (longtext, shorttext) = (text1, text2)
576
+ else:
577
+ (shorttext, longtext) = (text1, text2)
578
+ if len(longtext) < 4 or len(shorttext) * 2 < len(longtext):
579
+ return None # Pointless.
580
+
581
+ def diff_halfMatchI(longtext, shorttext, i):
582
+ """Does a substring of shorttext exist within longtext such that the
583
+ substring is at least half the length of longtext?
584
+ Closure, but does not reference any external variables.
585
+
586
+ Args:
587
+ longtext: Longer string.
588
+ shorttext: Shorter string.
589
+ i: Start index of quarter length substring within longtext.
590
+
591
+ Returns:
592
+ Five element Array, containing the prefix of longtext, the suffix of
593
+ longtext, the prefix of shorttext, the suffix of shorttext and the
594
+ common middle. Or None if there was no match.
595
+ """
596
+ seed = longtext[i:i + len(longtext) // 4]
597
+ best_common = ''
598
+ j = shorttext.find(seed)
599
+ while j != -1:
600
+ prefixLength = self.diff_commonPrefix(longtext[i:], shorttext[j:])
601
+ suffixLength = self.diff_commonSuffix(longtext[:i], shorttext[:j])
602
+ if len(best_common) < suffixLength + prefixLength:
603
+ best_common = (shorttext[j - suffixLength:j] +
604
+ shorttext[j:j + prefixLength])
605
+ best_longtext_a = longtext[:i - suffixLength]
606
+ best_longtext_b = longtext[i + prefixLength:]
607
+ best_shorttext_a = shorttext[:j - suffixLength]
608
+ best_shorttext_b = shorttext[j + prefixLength:]
609
+ j = shorttext.find(seed, j + 1)
610
+
611
+ if len(best_common) * 2 >= len(longtext):
612
+ return (best_longtext_a, best_longtext_b,
613
+ best_shorttext_a, best_shorttext_b, best_common)
614
+ else:
615
+ return None
616
+
617
+ # First check if the second quarter is the seed for a half-match.
618
+ hm1 = diff_halfMatchI(longtext, shorttext, (len(longtext) + 3) // 4)
619
+ # Check again based on the third quarter.
620
+ hm2 = diff_halfMatchI(longtext, shorttext, (len(longtext) + 1) // 2)
621
+ if not hm1 and not hm2:
622
+ return None
623
+ elif not hm2:
624
+ hm = hm1
625
+ elif not hm1:
626
+ hm = hm2
627
+ else:
628
+ # Both matched. Select the longest.
629
+ if len(hm1[4]) > len(hm2[4]):
630
+ hm = hm1
631
+ else:
632
+ hm = hm2
633
+
634
+ # A half-match was found, sort out the return data.
635
+ if len(text1) > len(text2):
636
+ (text1_a, text1_b, text2_a, text2_b, mid_common) = hm
637
+ else:
638
+ (text2_a, text2_b, text1_a, text1_b, mid_common) = hm
639
+ return (text1_a, text1_b, text2_a, text2_b, mid_common)
640
+
641
+ def diff_cleanupSemantic(self, diffs):
642
+ """Reduce the number of edits by eliminating semantically trivial
643
+ equalities.
644
+
645
+ Args:
646
+ diffs: Array of diff tuples.
647
+ """
648
+ changes = False
649
+ equalities = [] # Stack of indices where equalities are found.
650
+ lastEquality = None # Always equal to diffs[equalities[-1]][1]
651
+ pointer = 0 # Index of current position.
652
+ # Number of chars that changed prior to the equality.
653
+ length_insertions1, length_deletions1 = 0, 0
654
+ # Number of chars that changed after the equality.
655
+ length_insertions2, length_deletions2 = 0, 0
656
+ while pointer < len(diffs):
657
+ if diffs[pointer][0] == self.DIFF_EQUAL: # Equality found.
658
+ equalities.append(pointer)
659
+ length_insertions1, length_insertions2 = length_insertions2, 0
660
+ length_deletions1, length_deletions2 = length_deletions2, 0
661
+ lastEquality = diffs[pointer][1]
662
+ else: # An insertion or deletion.
663
+ if diffs[pointer][0] == self.DIFF_INSERT:
664
+ length_insertions2 += len(diffs[pointer][1])
665
+ else:
666
+ length_deletions2 += len(diffs[pointer][1])
667
+ # Eliminate an equality that is smaller or equal to the edits on both
668
+ # sides of it.
669
+ if (lastEquality and (len(lastEquality) <=
670
+ max(length_insertions1, length_deletions1)) and
671
+ (len(lastEquality) <= max(length_insertions2, length_deletions2))):
672
+ # Duplicate record.
673
+ diffs.insert(equalities[-1], (self.DIFF_DELETE, lastEquality))
674
+ # Change second copy to insert.
675
+ diffs[equalities[-1] + 1] = (self.DIFF_INSERT,
676
+ diffs[equalities[-1] + 1][1])
677
+ # Throw away the equality we just deleted.
678
+ equalities.pop()
679
+ # Throw away the previous equality (it needs to be reevaluated).
680
+ if len(equalities):
681
+ equalities.pop()
682
+ if len(equalities):
683
+ pointer = equalities[-1]
684
+ else:
685
+ pointer = -1
686
+ # Reset the counters.
687
+ length_insertions1, length_deletions1 = 0, 0
688
+ length_insertions2, length_deletions2 = 0, 0
689
+ lastEquality = None
690
+ changes = True
691
+ pointer += 1
692
+
693
+ # Normalize the diff.
694
+ if changes:
695
+ self.diff_cleanupMerge(diffs)
696
+ self.diff_cleanupSemanticLossless(diffs)
697
+
698
+ # Find any overlaps between deletions and insertions.
699
+ # e.g: <del>abcxxx</del><ins>xxxdef</ins>
700
+ # -> <del>abc</del>xxx<ins>def</ins>
701
+ # e.g: <del>xxxabc</del><ins>defxxx</ins>
702
+ # -> <ins>def</ins>xxx<del>abc</del>
703
+ # Only extract an overlap if it is as big as the edit ahead or behind it.
704
+ pointer = 1
705
+ while pointer < len(diffs):
706
+ if (diffs[pointer - 1][0] == self.DIFF_DELETE and
707
+ diffs[pointer][0] == self.DIFF_INSERT):
708
+ deletion = diffs[pointer - 1][1]
709
+ insertion = diffs[pointer][1]
710
+ overlap_length1 = self.diff_commonOverlap(deletion, insertion)
711
+ overlap_length2 = self.diff_commonOverlap(insertion, deletion)
712
+ if overlap_length1 >= overlap_length2:
713
+ if (overlap_length1 >= len(deletion) / 2.0 or
714
+ overlap_length1 >= len(insertion) / 2.0):
715
+ # Overlap found. Insert an equality and trim the surrounding edits.
716
+ diffs.insert(pointer, (self.DIFF_EQUAL,
717
+ insertion[:overlap_length1]))
718
+ diffs[pointer - 1] = (self.DIFF_DELETE,
719
+ deletion[:len(deletion) - overlap_length1])
720
+ diffs[pointer + 1] = (self.DIFF_INSERT,
721
+ insertion[overlap_length1:])
722
+ pointer += 1
723
+ else:
724
+ if (overlap_length2 >= len(deletion) / 2.0 or
725
+ overlap_length2 >= len(insertion) / 2.0):
726
+ # Reverse overlap found.
727
+ # Insert an equality and swap and trim the surrounding edits.
728
+ diffs.insert(pointer, (self.DIFF_EQUAL, deletion[:overlap_length2]))
729
+ diffs[pointer - 1] = (self.DIFF_INSERT,
730
+ insertion[:len(insertion) - overlap_length2])
731
+ diffs[pointer + 1] = (self.DIFF_DELETE, deletion[overlap_length2:])
732
+ pointer += 1
733
+ pointer += 1
734
+ pointer += 1
735
+
736
+ def diff_cleanupSemanticLossless(self, diffs):
737
+ """Look for single edits surrounded on both sides by equalities
738
+ which can be shifted sideways to align the edit to a word boundary.
739
+ e.g: The c<ins>at c</ins>ame. -> The <ins>cat </ins>came.
740
+
741
+ Args:
742
+ diffs: Array of diff tuples.
743
+ """
744
+
745
+ def diff_cleanupSemanticScore(one, two):
746
+ """Given two strings, compute a score representing whether the
747
+ internal boundary falls on logical boundaries.
748
+ Scores range from 6 (best) to 0 (worst).
749
+ Closure, but does not reference any external variables.
750
+
751
+ Args:
752
+ one: First string.
753
+ two: Second string.
754
+
755
+ Returns:
756
+ The score.
757
+ """
758
+ if not one or not two:
759
+ # Edges are the best.
760
+ return 6
761
+
762
+ # Each port of this function behaves slightly differently due to
763
+ # subtle differences in each language's definition of things like
764
+ # 'whitespace'. Since this function's purpose is largely cosmetic,
765
+ # the choice has been made to use each language's native features
766
+ # rather than force total conformity.
767
+ char1 = one[-1]
768
+ char2 = two[0]
769
+ nonAlphaNumeric1 = not char1.isalnum()
770
+ nonAlphaNumeric2 = not char2.isalnum()
771
+ whitespace1 = nonAlphaNumeric1 and char1.isspace()
772
+ whitespace2 = nonAlphaNumeric2 and char2.isspace()
773
+ lineBreak1 = whitespace1 and (char1 == "\r" or char1 == "\n")
774
+ lineBreak2 = whitespace2 and (char2 == "\r" or char2 == "\n")
775
+ blankLine1 = lineBreak1 and self.BLANKLINEEND.search(one)
776
+ blankLine2 = lineBreak2 and self.BLANKLINESTART.match(two)
777
+
778
+ if blankLine1 or blankLine2:
779
+ # Five points for blank lines.
780
+ return 5
781
+ elif lineBreak1 or lineBreak2:
782
+ # Four points for line breaks.
783
+ return 4
784
+ elif nonAlphaNumeric1 and not whitespace1 and whitespace2:
785
+ # Three points for end of sentences.
786
+ return 3
787
+ elif whitespace1 or whitespace2:
788
+ # Two points for whitespace.
789
+ return 2
790
+ elif nonAlphaNumeric1 or nonAlphaNumeric2:
791
+ # One point for non-alphanumeric.
792
+ return 1
793
+ return 0
794
+
795
+ pointer = 1
796
+ # Intentionally ignore the first and last element (don't need checking).
797
+ while pointer < len(diffs) - 1:
798
+ if (diffs[pointer - 1][0] == self.DIFF_EQUAL and
799
+ diffs[pointer + 1][0] == self.DIFF_EQUAL):
800
+ # This is a single edit surrounded by equalities.
801
+ equality1 = diffs[pointer - 1][1]
802
+ edit = diffs[pointer][1]
803
+ equality2 = diffs[pointer + 1][1]
804
+
805
+ # First, shift the edit as far left as possible.
806
+ commonOffset = self.diff_commonSuffix(equality1, edit)
807
+ if commonOffset:
808
+ commonString = edit[-commonOffset:]
809
+ equality1 = equality1[:-commonOffset]
810
+ edit = commonString + edit[:-commonOffset]
811
+ equality2 = commonString + equality2
812
+
813
+ # Second, step character by character right, looking for the best fit.
814
+ bestEquality1 = equality1
815
+ bestEdit = edit
816
+ bestEquality2 = equality2
817
+ bestScore = (diff_cleanupSemanticScore(equality1, edit) +
818
+ diff_cleanupSemanticScore(edit, equality2))
819
+ while edit and equality2 and edit[0] == equality2[0]:
820
+ equality1 += edit[0]
821
+ edit = edit[1:] + equality2[0]
822
+ equality2 = equality2[1:]
823
+ score = (diff_cleanupSemanticScore(equality1, edit) +
824
+ diff_cleanupSemanticScore(edit, equality2))
825
+ # The >= encourages trailing rather than leading whitespace on edits.
826
+ if score >= bestScore:
827
+ bestScore = score
828
+ bestEquality1 = equality1
829
+ bestEdit = edit
830
+ bestEquality2 = equality2
831
+
832
+ if diffs[pointer - 1][1] != bestEquality1:
833
+ # We have an improvement, save it back to the diff.
834
+ if bestEquality1:
835
+ diffs[pointer - 1] = (diffs[pointer - 1][0], bestEquality1)
836
+ else:
837
+ del diffs[pointer - 1]
838
+ pointer -= 1
839
+ diffs[pointer] = (diffs[pointer][0], bestEdit)
840
+ if bestEquality2:
841
+ diffs[pointer + 1] = (diffs[pointer + 1][0], bestEquality2)
842
+ else:
843
+ del diffs[pointer + 1]
844
+ pointer -= 1
845
+ pointer += 1
846
+
847
+ # Define some regex patterns for matching boundaries.
848
+ BLANKLINEEND = re.compile(r"\n\r?\n$")
849
+ BLANKLINESTART = re.compile(r"^\r?\n\r?\n")
850
+
851
+ def diff_cleanupEfficiency(self, diffs):
852
+ """Reduce the number of edits by eliminating operationally trivial
853
+ equalities.
854
+
855
+ Args:
856
+ diffs: Array of diff tuples.
857
+ """
858
+ changes = False
859
+ equalities = [] # Stack of indices where equalities are found.
860
+ lastEquality = None # Always equal to diffs[equalities[-1]][1]
861
+ pointer = 0 # Index of current position.
862
+ pre_ins = False # Is there an insertion operation before the last equality.
863
+ pre_del = False # Is there a deletion operation before the last equality.
864
+ post_ins = False # Is there an insertion operation after the last equality.
865
+ post_del = False # Is there a deletion operation after the last equality.
866
+ while pointer < len(diffs):
867
+ if diffs[pointer][0] == self.DIFF_EQUAL: # Equality found.
868
+ if (len(diffs[pointer][1]) < self.Diff_EditCost and
869
+ (post_ins or post_del)):
870
+ # Candidate found.
871
+ equalities.append(pointer)
872
+ pre_ins = post_ins
873
+ pre_del = post_del
874
+ lastEquality = diffs[pointer][1]
875
+ else:
876
+ # Not a candidate, and can never become one.
877
+ equalities = []
878
+ lastEquality = None
879
+
880
+ post_ins = post_del = False
881
+ else: # An insertion or deletion.
882
+ if diffs[pointer][0] == self.DIFF_DELETE:
883
+ post_del = True
884
+ else:
885
+ post_ins = True
886
+
887
+ # Five types to be split:
888
+ # <ins>A</ins><del>B</del>XY<ins>C</ins><del>D</del>
889
+ # <ins>A</ins>X<ins>C</ins><del>D</del>
890
+ # <ins>A</ins><del>B</del>X<ins>C</ins>
891
+ # <ins>A</del>X<ins>C</ins><del>D</del>
892
+ # <ins>A</ins><del>B</del>X<del>C</del>
893
+
894
+ if lastEquality and ((pre_ins and pre_del and post_ins and post_del) or
895
+ ((len(lastEquality) < self.Diff_EditCost / 2) and
896
+ (pre_ins + pre_del + post_ins + post_del) == 3)):
897
+ # Duplicate record.
898
+ diffs.insert(equalities[-1], (self.DIFF_DELETE, lastEquality))
899
+ # Change second copy to insert.
900
+ diffs[equalities[-1] + 1] = (self.DIFF_INSERT,
901
+ diffs[equalities[-1] + 1][1])
902
+ equalities.pop() # Throw away the equality we just deleted.
903
+ lastEquality = None
904
+ if pre_ins and pre_del:
905
+ # No changes made which could affect previous entry, keep going.
906
+ post_ins = post_del = True
907
+ equalities = []
908
+ else:
909
+ if len(equalities):
910
+ equalities.pop() # Throw away the previous equality.
911
+ if len(equalities):
912
+ pointer = equalities[-1]
913
+ else:
914
+ pointer = -1
915
+ post_ins = post_del = False
916
+ changes = True
917
+ pointer += 1
918
+
919
+ if changes:
920
+ self.diff_cleanupMerge(diffs)
921
+
922
+ def diff_cleanupMerge(self, diffs):
923
+ """Reorder and merge like edit sections. Merge equalities.
924
+ Any edit section can move as long as it doesn't cross an equality.
925
+
926
+ Args:
927
+ diffs: Array of diff tuples.
928
+ """
929
+ diffs.append((self.DIFF_EQUAL, '')) # Add a dummy entry at the end.
930
+ pointer = 0
931
+ count_delete = 0
932
+ count_insert = 0
933
+ text_delete = ''
934
+ text_insert = ''
935
+ while pointer < len(diffs):
936
+ if diffs[pointer][0] == self.DIFF_INSERT:
937
+ count_insert += 1
938
+ text_insert += diffs[pointer][1]
939
+ pointer += 1
940
+ elif diffs[pointer][0] == self.DIFF_DELETE:
941
+ count_delete += 1
942
+ text_delete += diffs[pointer][1]
943
+ pointer += 1
944
+ elif diffs[pointer][0] == self.DIFF_EQUAL:
945
+ # Upon reaching an equality, check for prior redundancies.
946
+ if count_delete + count_insert > 1:
947
+ if count_delete != 0 and count_insert != 0:
948
+ # Factor out any common prefixies.
949
+ commonlength = self.diff_commonPrefix(text_insert, text_delete)
950
+ if commonlength != 0:
951
+ x = pointer - count_delete - count_insert - 1
952
+ if x >= 0 and diffs[x][0] == self.DIFF_EQUAL:
953
+ diffs[x] = (diffs[x][0], diffs[x][1] +
954
+ text_insert[:commonlength])
955
+ else:
956
+ diffs.insert(0, (self.DIFF_EQUAL, text_insert[:commonlength]))
957
+ pointer += 1
958
+ text_insert = text_insert[commonlength:]
959
+ text_delete = text_delete[commonlength:]
960
+ # Factor out any common suffixies.
961
+ commonlength = self.diff_commonSuffix(text_insert, text_delete)
962
+ if commonlength != 0:
963
+ diffs[pointer] = (diffs[pointer][0], text_insert[-commonlength:] +
964
+ diffs[pointer][1])
965
+ text_insert = text_insert[:-commonlength]
966
+ text_delete = text_delete[:-commonlength]
967
+ # Delete the offending records and add the merged ones.
968
+ new_ops = []
969
+ if len(text_delete) != 0:
970
+ new_ops.append((self.DIFF_DELETE, text_delete))
971
+ if len(text_insert) != 0:
972
+ new_ops.append((self.DIFF_INSERT, text_insert))
973
+ pointer -= count_delete + count_insert
974
+ diffs[pointer : pointer + count_delete + count_insert] = new_ops
975
+ pointer += len(new_ops) + 1
976
+ elif pointer != 0 and diffs[pointer - 1][0] == self.DIFF_EQUAL:
977
+ # Merge this equality with the previous one.
978
+ diffs[pointer - 1] = (diffs[pointer - 1][0],
979
+ diffs[pointer - 1][1] + diffs[pointer][1])
980
+ del diffs[pointer]
981
+ else:
982
+ pointer += 1
983
+
984
+ count_insert = 0
985
+ count_delete = 0
986
+ text_delete = ''
987
+ text_insert = ''
988
+
989
+ if diffs[-1][1] == '':
990
+ diffs.pop() # Remove the dummy entry at the end.
991
+
992
+ # Second pass: look for single edits surrounded on both sides by equalities
993
+ # which can be shifted sideways to eliminate an equality.
994
+ # e.g: A<ins>BA</ins>C -> <ins>AB</ins>AC
995
+ changes = False
996
+ pointer = 1
997
+ # Intentionally ignore the first and last element (don't need checking).
998
+ while pointer < len(diffs) - 1:
999
+ if (diffs[pointer - 1][0] == self.DIFF_EQUAL and
1000
+ diffs[pointer + 1][0] == self.DIFF_EQUAL):
1001
+ # This is a single edit surrounded by equalities.
1002
+ if diffs[pointer][1].endswith(diffs[pointer - 1][1]):
1003
+ # Shift the edit over the previous equality.
1004
+ if diffs[pointer - 1][1] != "":
1005
+ diffs[pointer] = (diffs[pointer][0],
1006
+ diffs[pointer - 1][1] +
1007
+ diffs[pointer][1][:-len(diffs[pointer - 1][1])])
1008
+ diffs[pointer + 1] = (diffs[pointer + 1][0],
1009
+ diffs[pointer - 1][1] + diffs[pointer + 1][1])
1010
+ del diffs[pointer - 1]
1011
+ changes = True
1012
+ elif diffs[pointer][1].startswith(diffs[pointer + 1][1]):
1013
+ # Shift the edit over the next equality.
1014
+ diffs[pointer - 1] = (diffs[pointer - 1][0],
1015
+ diffs[pointer - 1][1] + diffs[pointer + 1][1])
1016
+ diffs[pointer] = (diffs[pointer][0],
1017
+ diffs[pointer][1][len(diffs[pointer + 1][1]):] +
1018
+ diffs[pointer + 1][1])
1019
+ del diffs[pointer + 1]
1020
+ changes = True
1021
+ pointer += 1
1022
+
1023
+ # If shifts were made, the diff needs reordering and another shift sweep.
1024
+ if changes:
1025
+ self.diff_cleanupMerge(diffs)
1026
+
1027
+ def diff_xIndex(self, diffs, loc):
1028
+ """loc is a location in text1, compute and return the equivalent location
1029
+ in text2. e.g. "The cat" vs "The big cat", 1->1, 5->8
1030
+
1031
+ Args:
1032
+ diffs: Array of diff tuples.
1033
+ loc: Location within text1.
1034
+
1035
+ Returns:
1036
+ Location within text2.
1037
+ """
1038
+ chars1 = 0
1039
+ chars2 = 0
1040
+ last_chars1 = 0
1041
+ last_chars2 = 0
1042
+ for x in range(len(diffs)):
1043
+ (op, text) = diffs[x]
1044
+ if op != self.DIFF_INSERT: # Equality or deletion.
1045
+ chars1 += len(text)
1046
+ if op != self.DIFF_DELETE: # Equality or insertion.
1047
+ chars2 += len(text)
1048
+ if chars1 > loc: # Overshot the location.
1049
+ break
1050
+ last_chars1 = chars1
1051
+ last_chars2 = chars2
1052
+
1053
+ if len(diffs) != x and diffs[x][0] == self.DIFF_DELETE:
1054
+ # The location was deleted.
1055
+ return last_chars2
1056
+ # Add the remaining len(character).
1057
+ return last_chars2 + (loc - last_chars1)
1058
+
1059
+ def diff_prettyHtml(self, diffs):
1060
+ """Convert a diff array into a pretty HTML report.
1061
+
1062
+ Args:
1063
+ diffs: Array of diff tuples.
1064
+
1065
+ Returns:
1066
+ HTML representation.
1067
+ """
1068
+ html = []
1069
+ for (op, data) in diffs:
1070
+ text = (data.replace("&", "&amp;").replace("<", "&lt;")
1071
+ .replace(">", "&gt;").replace("\n", "&para;<br>"))
1072
+ if op == self.DIFF_INSERT:
1073
+ html.append("<ins style=\"background:#e6ffe6;\">%s</ins>" % text)
1074
+ elif op == self.DIFF_DELETE:
1075
+ html.append("<del style=\"background:#ffe6e6;\">%s</del>" % text)
1076
+ elif op == self.DIFF_EQUAL:
1077
+ html.append("<span>%s</span>" % text)
1078
+ return "".join(html)
1079
+
1080
+ def diff_text1(self, diffs):
1081
+ """Compute and return the source text (all equalities and deletions).
1082
+
1083
+ Args:
1084
+ diffs: Array of diff tuples.
1085
+
1086
+ Returns:
1087
+ Source text.
1088
+ """
1089
+ text = []
1090
+ for (op, data) in diffs:
1091
+ if op != self.DIFF_INSERT:
1092
+ text.append(data)
1093
+ return "".join(text)
1094
+
1095
+ def diff_text2(self, diffs):
1096
+ """Compute and return the destination text (all equalities and insertions).
1097
+
1098
+ Args:
1099
+ diffs: Array of diff tuples.
1100
+
1101
+ Returns:
1102
+ Destination text.
1103
+ """
1104
+ text = []
1105
+ for (op, data) in diffs:
1106
+ if op != self.DIFF_DELETE:
1107
+ text.append(data)
1108
+ return "".join(text)
1109
+
1110
+ def diff_levenshtein(self, diffs):
1111
+ """Compute the Levenshtein distance; the number of inserted, deleted or
1112
+ substituted characters.
1113
+
1114
+ Args:
1115
+ diffs: Array of diff tuples.
1116
+
1117
+ Returns:
1118
+ Number of changes.
1119
+ """
1120
+ levenshtein = 0
1121
+ insertions = 0
1122
+ deletions = 0
1123
+ for (op, data) in diffs:
1124
+ if op == self.DIFF_INSERT:
1125
+ insertions += len(data)
1126
+ elif op == self.DIFF_DELETE:
1127
+ deletions += len(data)
1128
+ elif op == self.DIFF_EQUAL:
1129
+ # A deletion and an insertion is one substitution.
1130
+ levenshtein += max(insertions, deletions)
1131
+ insertions = 0
1132
+ deletions = 0
1133
+ levenshtein += max(insertions, deletions)
1134
+ return levenshtein
1135
+
1136
+ def diff_toDelta(self, diffs):
1137
+ """Crush the diff into an encoded string which describes the operations
1138
+ required to transform text1 into text2.
1139
+ E.g. =3\t-2\t+ing -> Keep 3 chars, delete 2 chars, insert 'ing'.
1140
+ Operations are tab-separated. Inserted text is escaped using %xx notation.
1141
+
1142
+ Args:
1143
+ diffs: Array of diff tuples.
1144
+
1145
+ Returns:
1146
+ Delta text.
1147
+ """
1148
+ text = []
1149
+ for (op, data) in diffs:
1150
+ if op == self.DIFF_INSERT:
1151
+ # High ascii will raise UnicodeDecodeError. Use Unicode instead.
1152
+ data = data.encode("utf-8")
1153
+ text.append("+" + urllib.parse.quote(data, "!~*'();/?:@&=+$,# "))
1154
+ elif op == self.DIFF_DELETE:
1155
+ text.append("-%d" % len(data))
1156
+ elif op == self.DIFF_EQUAL:
1157
+ text.append("=%d" % len(data))
1158
+ return "\t".join(text)
1159
+
1160
+ def diff_fromDelta(self, text1, delta):
1161
+ """Given the original text1, and an encoded string which describes the
1162
+ operations required to transform text1 into text2, compute the full diff.
1163
+
1164
+ Args:
1165
+ text1: Source string for the diff.
1166
+ delta: Delta text.
1167
+
1168
+ Returns:
1169
+ Array of diff tuples.
1170
+
1171
+ Raises:
1172
+ ValueError: If invalid input.
1173
+ """
1174
+ diffs = []
1175
+ pointer = 0 # Cursor in text1
1176
+ tokens = delta.split("\t")
1177
+ for token in tokens:
1178
+ if token == "":
1179
+ # Blank tokens are ok (from a trailing \t).
1180
+ continue
1181
+ # Each token begins with a one character parameter which specifies the
1182
+ # operation of this token (delete, insert, equality).
1183
+ param = token[1:]
1184
+ if token[0] == "+":
1185
+ param = urllib.parse.unquote(param)
1186
+ diffs.append((self.DIFF_INSERT, param))
1187
+ elif token[0] == "-" or token[0] == "=":
1188
+ try:
1189
+ n = int(param)
1190
+ except ValueError:
1191
+ raise ValueError("Invalid number in diff_fromDelta: " + param)
1192
+ if n < 0:
1193
+ raise ValueError("Negative number in diff_fromDelta: " + param)
1194
+ text = text1[pointer : pointer + n]
1195
+ pointer += n
1196
+ if token[0] == "=":
1197
+ diffs.append((self.DIFF_EQUAL, text))
1198
+ else:
1199
+ diffs.append((self.DIFF_DELETE, text))
1200
+ else:
1201
+ # Anything else is an error.
1202
+ raise ValueError("Invalid diff operation in diff_fromDelta: " +
1203
+ token[0])
1204
+ if pointer != len(text1):
1205
+ raise ValueError(
1206
+ "Delta length (%d) does not equal source text length (%d)." %
1207
+ (pointer, len(text1)))
1208
+ return diffs
1209
+
1210
+ # MATCH FUNCTIONS
1211
+
1212
+ def match_main(self, text, pattern, loc):
1213
+ """Locate the best instance of 'pattern' in 'text' near 'loc'.
1214
+
1215
+ Args:
1216
+ text: The text to search.
1217
+ pattern: The pattern to search for.
1218
+ loc: The location to search around.
1219
+
1220
+ Returns:
1221
+ Best match index or -1.
1222
+ """
1223
+ # Check for null inputs.
1224
+ if text == None or pattern == None:
1225
+ raise ValueError("Null inputs. (match_main)")
1226
+
1227
+ loc = max(0, min(loc, len(text)))
1228
+ if text == pattern:
1229
+ # Shortcut (potentially not guaranteed by the algorithm)
1230
+ return 0
1231
+ elif not text:
1232
+ # Nothing to match.
1233
+ return -1
1234
+ elif text[loc:loc + len(pattern)] == pattern:
1235
+ # Perfect match at the perfect spot! (Includes case of null pattern)
1236
+ return loc
1237
+ else:
1238
+ # Do a fuzzy compare.
1239
+ match = self.match_bitap(text, pattern, loc)
1240
+ return match
1241
+
1242
+ def match_bitap(self, text, pattern, loc):
1243
+ """Locate the best instance of 'pattern' in 'text' near 'loc' using the
1244
+ Bitap algorithm.
1245
+
1246
+ Args:
1247
+ text: The text to search.
1248
+ pattern: The pattern to search for.
1249
+ loc: The location to search around.
1250
+
1251
+ Returns:
1252
+ Best match index or -1.
1253
+ """
1254
+ # Python doesn't have a maxint limit, so ignore this check.
1255
+ #if self.Match_MaxBits != 0 and len(pattern) > self.Match_MaxBits:
1256
+ # raise ValueError("Pattern too long for this application.")
1257
+
1258
+ # Initialise the alphabet.
1259
+ s = self.match_alphabet(pattern)
1260
+
1261
+ def match_bitapScore(e, x):
1262
+ """Compute and return the score for a match with e errors and x location.
1263
+ Accesses loc and pattern through being a closure.
1264
+
1265
+ Args:
1266
+ e: Number of errors in match.
1267
+ x: Location of match.
1268
+
1269
+ Returns:
1270
+ Overall score for match (0.0 = good, 1.0 = bad).
1271
+ """
1272
+ accuracy = float(e) / len(pattern)
1273
+ proximity = abs(loc - x)
1274
+ if not self.Match_Distance:
1275
+ # Dodge divide by zero error.
1276
+ return proximity and 1.0 or accuracy
1277
+ return accuracy + (proximity / float(self.Match_Distance))
1278
+
1279
+ # Highest score beyond which we give up.
1280
+ score_threshold = self.Match_Threshold
1281
+ # Is there a nearby exact match? (speedup)
1282
+ best_loc = text.find(pattern, loc)
1283
+ if best_loc != -1:
1284
+ score_threshold = min(match_bitapScore(0, best_loc), score_threshold)
1285
+ # What about in the other direction? (speedup)
1286
+ best_loc = text.rfind(pattern, loc + len(pattern))
1287
+ if best_loc != -1:
1288
+ score_threshold = min(match_bitapScore(0, best_loc), score_threshold)
1289
+
1290
+ # Initialise the bit arrays.
1291
+ matchmask = 1 << (len(pattern) - 1)
1292
+ best_loc = -1
1293
+
1294
+ bin_max = len(pattern) + len(text)
1295
+ # Empty initialization added to appease pychecker.
1296
+ last_rd = None
1297
+ for d in range(len(pattern)):
1298
+ # Scan for the best match each iteration allows for one more error.
1299
+ # Run a binary search to determine how far from 'loc' we can stray at
1300
+ # this error level.
1301
+ bin_min = 0
1302
+ bin_mid = bin_max
1303
+ while bin_min < bin_mid:
1304
+ if match_bitapScore(d, loc + bin_mid) <= score_threshold:
1305
+ bin_min = bin_mid
1306
+ else:
1307
+ bin_max = bin_mid
1308
+ bin_mid = (bin_max - bin_min) // 2 + bin_min
1309
+
1310
+ # Use the result from this iteration as the maximum for the next.
1311
+ bin_max = bin_mid
1312
+ start = max(1, loc - bin_mid + 1)
1313
+ finish = min(loc + bin_mid, len(text)) + len(pattern)
1314
+
1315
+ rd = [0] * (finish + 2)
1316
+ rd[finish + 1] = (1 << d) - 1
1317
+ for j in range(finish, start - 1, -1):
1318
+ if len(text) <= j - 1:
1319
+ # Out of range.
1320
+ charMatch = 0
1321
+ else:
1322
+ charMatch = s.get(text[j - 1], 0)
1323
+ if d == 0: # First pass: exact match.
1324
+ rd[j] = ((rd[j + 1] << 1) | 1) & charMatch
1325
+ else: # Subsequent passes: fuzzy match.
1326
+ rd[j] = (((rd[j + 1] << 1) | 1) & charMatch) | (
1327
+ ((last_rd[j + 1] | last_rd[j]) << 1) | 1) | last_rd[j + 1]
1328
+ if rd[j] & matchmask:
1329
+ score = match_bitapScore(d, j - 1)
1330
+ # This match will almost certainly be better than any existing match.
1331
+ # But check anyway.
1332
+ if score <= score_threshold:
1333
+ # Told you so.
1334
+ score_threshold = score
1335
+ best_loc = j - 1
1336
+ if best_loc > loc:
1337
+ # When passing loc, don't exceed our current distance from loc.
1338
+ start = max(1, 2 * loc - best_loc)
1339
+ else:
1340
+ # Already passed loc, downhill from here on in.
1341
+ break
1342
+ # No hope for a (better) match at greater error levels.
1343
+ if match_bitapScore(d + 1, loc) > score_threshold:
1344
+ break
1345
+ last_rd = rd
1346
+ return best_loc
1347
+
1348
+ def match_alphabet(self, pattern):
1349
+ """Initialise the alphabet for the Bitap algorithm.
1350
+
1351
+ Args:
1352
+ pattern: The text to encode.
1353
+
1354
+ Returns:
1355
+ Hash of character locations.
1356
+ """
1357
+ s = {}
1358
+ for char in pattern:
1359
+ s[char] = 0
1360
+ for i in range(len(pattern)):
1361
+ s[pattern[i]] |= 1 << (len(pattern) - i - 1)
1362
+ return s
1363
+
1364
+ # PATCH FUNCTIONS
1365
+
1366
+ def patch_addContext(self, patch, text):
1367
+ """Increase the context until it is unique,
1368
+ but don't let the pattern expand beyond Match_MaxBits.
1369
+
1370
+ Args:
1371
+ patch: The patch to grow.
1372
+ text: Source text.
1373
+ """
1374
+ if len(text) == 0:
1375
+ return
1376
+ pattern = text[patch.start2 : patch.start2 + patch.length1]
1377
+ padding = 0
1378
+
1379
+ # Look for the first and last matches of pattern in text. If two different
1380
+ # matches are found, increase the pattern length.
1381
+ while (text.find(pattern) != text.rfind(pattern) and (self.Match_MaxBits ==
1382
+ 0 or len(pattern) < self.Match_MaxBits - self.Patch_Margin -
1383
+ self.Patch_Margin)):
1384
+ padding += self.Patch_Margin
1385
+ pattern = text[max(0, patch.start2 - padding) :
1386
+ patch.start2 + patch.length1 + padding]
1387
+ # Add one chunk for good luck.
1388
+ padding += self.Patch_Margin
1389
+
1390
+ # Add the prefix.
1391
+ prefix = text[max(0, patch.start2 - padding) : patch.start2]
1392
+ if prefix:
1393
+ patch.diffs[:0] = [(self.DIFF_EQUAL, prefix)]
1394
+ # Add the suffix.
1395
+ suffix = text[patch.start2 + patch.length1 :
1396
+ patch.start2 + patch.length1 + padding]
1397
+ if suffix:
1398
+ patch.diffs.append((self.DIFF_EQUAL, suffix))
1399
+
1400
+ # Roll back the start points.
1401
+ patch.start1 -= len(prefix)
1402
+ patch.start2 -= len(prefix)
1403
+ # Extend lengths.
1404
+ patch.length1 += len(prefix) + len(suffix)
1405
+ patch.length2 += len(prefix) + len(suffix)
1406
+
1407
+ def patch_make(self, a, b=None, c=None):
1408
+ """Compute a list of patches to turn text1 into text2.
1409
+ Use diffs if provided, otherwise compute it ourselves.
1410
+ There are four ways to call this function, depending on what data is
1411
+ available to the caller:
1412
+ Method 1:
1413
+ a = text1, b = text2
1414
+ Method 2:
1415
+ a = diffs
1416
+ Method 3 (optimal):
1417
+ a = text1, b = diffs
1418
+ Method 4 (deprecated, use method 3):
1419
+ a = text1, b = text2, c = diffs
1420
+
1421
+ Args:
1422
+ a: text1 (methods 1,3,4) or Array of diff tuples for text1 to
1423
+ text2 (method 2).
1424
+ b: text2 (methods 1,4) or Array of diff tuples for text1 to
1425
+ text2 (method 3) or undefined (method 2).
1426
+ c: Array of diff tuples for text1 to text2 (method 4) or
1427
+ undefined (methods 1,2,3).
1428
+
1429
+ Returns:
1430
+ Array of Patch objects.
1431
+ """
1432
+ text1 = None
1433
+ diffs = None
1434
+ if isinstance(a, str) and isinstance(b, str) and c is None:
1435
+ # Method 1: text1, text2
1436
+ # Compute diffs from text1 and text2.
1437
+ text1 = a
1438
+ diffs = self.diff_main(text1, b, True)
1439
+ if len(diffs) > 2:
1440
+ self.diff_cleanupSemantic(diffs)
1441
+ self.diff_cleanupEfficiency(diffs)
1442
+ elif isinstance(a, list) and b is None and c is None:
1443
+ # Method 2: diffs
1444
+ # Compute text1 from diffs.
1445
+ diffs = a
1446
+ text1 = self.diff_text1(diffs)
1447
+ elif isinstance(a, str) and isinstance(b, list) and c is None:
1448
+ # Method 3: text1, diffs
1449
+ text1 = a
1450
+ diffs = b
1451
+ elif (isinstance(a, str) and isinstance(b, str) and
1452
+ isinstance(c, list)):
1453
+ # Method 4: text1, text2, diffs
1454
+ # text2 is not used.
1455
+ text1 = a
1456
+ diffs = c
1457
+ else:
1458
+ raise ValueError("Unknown call format to patch_make.")
1459
+
1460
+ if not diffs:
1461
+ return [] # Get rid of the None case.
1462
+ patches = []
1463
+ patch = patch_obj()
1464
+ char_count1 = 0 # Number of characters into the text1 string.
1465
+ char_count2 = 0 # Number of characters into the text2 string.
1466
+ prepatch_text = text1 # Recreate the patches to determine context info.
1467
+ postpatch_text = text1
1468
+ for x in range(len(diffs)):
1469
+ (diff_type, diff_text) = diffs[x]
1470
+ if len(patch.diffs) == 0 and diff_type != self.DIFF_EQUAL:
1471
+ # A new patch starts here.
1472
+ patch.start1 = char_count1
1473
+ patch.start2 = char_count2
1474
+ if diff_type == self.DIFF_INSERT:
1475
+ # Insertion
1476
+ patch.diffs.append(diffs[x])
1477
+ patch.length2 += len(diff_text)
1478
+ postpatch_text = (postpatch_text[:char_count2] + diff_text +
1479
+ postpatch_text[char_count2:])
1480
+ elif diff_type == self.DIFF_DELETE:
1481
+ # Deletion.
1482
+ patch.length1 += len(diff_text)
1483
+ patch.diffs.append(diffs[x])
1484
+ postpatch_text = (postpatch_text[:char_count2] +
1485
+ postpatch_text[char_count2 + len(diff_text):])
1486
+ elif (diff_type == self.DIFF_EQUAL and
1487
+ len(diff_text) <= 2 * self.Patch_Margin and
1488
+ len(patch.diffs) != 0 and len(diffs) != x + 1):
1489
+ # Small equality inside a patch.
1490
+ patch.diffs.append(diffs[x])
1491
+ patch.length1 += len(diff_text)
1492
+ patch.length2 += len(diff_text)
1493
+
1494
+ if (diff_type == self.DIFF_EQUAL and
1495
+ len(diff_text) >= 2 * self.Patch_Margin):
1496
+ # Time for a new patch.
1497
+ if len(patch.diffs) != 0:
1498
+ self.patch_addContext(patch, prepatch_text)
1499
+ patches.append(patch)
1500
+ patch = patch_obj()
1501
+ # Unlike Unidiff, our patch lists have a rolling context.
1502
+ # https://github.com/google/diff-match-patch/wiki/Unidiff
1503
+ # Update prepatch text & pos to reflect the application of the
1504
+ # just completed patch.
1505
+ prepatch_text = postpatch_text
1506
+ char_count1 = char_count2
1507
+
1508
+ # Update the current character count.
1509
+ if diff_type != self.DIFF_INSERT:
1510
+ char_count1 += len(diff_text)
1511
+ if diff_type != self.DIFF_DELETE:
1512
+ char_count2 += len(diff_text)
1513
+
1514
+ # Pick up the leftover patch if not empty.
1515
+ if len(patch.diffs) != 0:
1516
+ self.patch_addContext(patch, prepatch_text)
1517
+ patches.append(patch)
1518
+ return patches
1519
+
1520
+ def patch_deepCopy(self, patches):
1521
+ """Given an array of patches, return another array that is identical.
1522
+
1523
+ Args:
1524
+ patches: Array of Patch objects.
1525
+
1526
+ Returns:
1527
+ Array of Patch objects.
1528
+ """
1529
+ patchesCopy = []
1530
+ for patch in patches:
1531
+ patchCopy = patch_obj()
1532
+ # No need to deep copy the tuples since they are immutable.
1533
+ patchCopy.diffs = patch.diffs[:]
1534
+ patchCopy.start1 = patch.start1
1535
+ patchCopy.start2 = patch.start2
1536
+ patchCopy.length1 = patch.length1
1537
+ patchCopy.length2 = patch.length2
1538
+ patchesCopy.append(patchCopy)
1539
+ return patchesCopy
1540
+
1541
+ def patch_apply(self, patches, text):
1542
+ """Merge a set of patches onto the text. Return a patched text, as well
1543
+ as a list of true/false values indicating which patches were applied.
1544
+
1545
+ Args:
1546
+ patches: Array of Patch objects.
1547
+ text: Old text.
1548
+
1549
+ Returns:
1550
+ Two element Array, containing the new text and an array of boolean values.
1551
+ """
1552
+ if not patches:
1553
+ return (text, [])
1554
+
1555
+ # Deep copy the patches so that no changes are made to originals.
1556
+ patches = self.patch_deepCopy(patches)
1557
+
1558
+ nullPadding = self.patch_addPadding(patches)
1559
+ text = nullPadding + text + nullPadding
1560
+ self.patch_splitMax(patches)
1561
+
1562
+ # delta keeps track of the offset between the expected and actual location
1563
+ # of the previous patch. If there are patches expected at positions 10 and
1564
+ # 20, but the first patch was found at 12, delta is 2 and the second patch
1565
+ # has an effective expected position of 22.
1566
+ delta = 0
1567
+ results = []
1568
+ for patch in patches:
1569
+ expected_loc = patch.start2 + delta
1570
+ text1 = self.diff_text1(patch.diffs)
1571
+ end_loc = -1
1572
+ if len(text1) > self.Match_MaxBits:
1573
+ # patch_splitMax will only provide an oversized pattern in the case of
1574
+ # a monster delete.
1575
+ start_loc = self.match_main(text, text1[:self.Match_MaxBits],
1576
+ expected_loc)
1577
+ if start_loc != -1:
1578
+ end_loc = self.match_main(text, text1[-self.Match_MaxBits:],
1579
+ expected_loc + len(text1) - self.Match_MaxBits)
1580
+ if end_loc == -1 or start_loc >= end_loc:
1581
+ # Can't find valid trailing context. Drop this patch.
1582
+ start_loc = -1
1583
+ else:
1584
+ start_loc = self.match_main(text, text1, expected_loc)
1585
+ if start_loc == -1:
1586
+ # No match found. :(
1587
+ results.append(False)
1588
+ # Subtract the delta for this failed patch from subsequent patches.
1589
+ delta -= patch.length2 - patch.length1
1590
+ else:
1591
+ # Found a match. :)
1592
+ results.append(True)
1593
+ delta = start_loc - expected_loc
1594
+ if end_loc == -1:
1595
+ text2 = text[start_loc : start_loc + len(text1)]
1596
+ else:
1597
+ text2 = text[start_loc : end_loc + self.Match_MaxBits]
1598
+ if text1 == text2:
1599
+ # Perfect match, just shove the replacement text in.
1600
+ text = (text[:start_loc] + self.diff_text2(patch.diffs) +
1601
+ text[start_loc + len(text1):])
1602
+ else:
1603
+ # Imperfect match.
1604
+ # Run a diff to get a framework of equivalent indices.
1605
+ diffs = self.diff_main(text1, text2, False)
1606
+ if (len(text1) > self.Match_MaxBits and
1607
+ self.diff_levenshtein(diffs) / float(len(text1)) >
1608
+ self.Patch_DeleteThreshold):
1609
+ # The end points match, but the content is unacceptably bad.
1610
+ results[-1] = False
1611
+ else:
1612
+ self.diff_cleanupSemanticLossless(diffs)
1613
+ index1 = 0
1614
+ for (op, data) in patch.diffs:
1615
+ if op != self.DIFF_EQUAL:
1616
+ index2 = self.diff_xIndex(diffs, index1)
1617
+ if op == self.DIFF_INSERT: # Insertion
1618
+ text = text[:start_loc + index2] + data + text[start_loc +
1619
+ index2:]
1620
+ elif op == self.DIFF_DELETE: # Deletion
1621
+ text = text[:start_loc + index2] + text[start_loc +
1622
+ self.diff_xIndex(diffs, index1 + len(data)):]
1623
+ if op != self.DIFF_DELETE:
1624
+ index1 += len(data)
1625
+ # Strip the padding off.
1626
+ text = text[len(nullPadding):-len(nullPadding)]
1627
+ return (text, results)
1628
+
1629
+ def patch_addPadding(self, patches):
1630
+ """Add some padding on text start and end so that edges can match
1631
+ something. Intended to be called only from within patch_apply.
1632
+
1633
+ Args:
1634
+ patches: Array of Patch objects.
1635
+
1636
+ Returns:
1637
+ The padding string added to each side.
1638
+ """
1639
+ paddingLength = self.Patch_Margin
1640
+ nullPadding = ""
1641
+ for x in range(1, paddingLength + 1):
1642
+ nullPadding += chr(x)
1643
+
1644
+ # Bump all the patches forward.
1645
+ for patch in patches:
1646
+ patch.start1 += paddingLength
1647
+ patch.start2 += paddingLength
1648
+
1649
+ # Add some padding on start of first diff.
1650
+ patch = patches[0]
1651
+ diffs = patch.diffs
1652
+ if not diffs or diffs[0][0] != self.DIFF_EQUAL:
1653
+ # Add nullPadding equality.
1654
+ diffs.insert(0, (self.DIFF_EQUAL, nullPadding))
1655
+ patch.start1 -= paddingLength # Should be 0.
1656
+ patch.start2 -= paddingLength # Should be 0.
1657
+ patch.length1 += paddingLength
1658
+ patch.length2 += paddingLength
1659
+ elif paddingLength > len(diffs[0][1]):
1660
+ # Grow first equality.
1661
+ extraLength = paddingLength - len(diffs[0][1])
1662
+ newText = nullPadding[len(diffs[0][1]):] + diffs[0][1]
1663
+ diffs[0] = (diffs[0][0], newText)
1664
+ patch.start1 -= extraLength
1665
+ patch.start2 -= extraLength
1666
+ patch.length1 += extraLength
1667
+ patch.length2 += extraLength
1668
+
1669
+ # Add some padding on end of last diff.
1670
+ patch = patches[-1]
1671
+ diffs = patch.diffs
1672
+ if not diffs or diffs[-1][0] != self.DIFF_EQUAL:
1673
+ # Add nullPadding equality.
1674
+ diffs.append((self.DIFF_EQUAL, nullPadding))
1675
+ patch.length1 += paddingLength
1676
+ patch.length2 += paddingLength
1677
+ elif paddingLength > len(diffs[-1][1]):
1678
+ # Grow last equality.
1679
+ extraLength = paddingLength - len(diffs[-1][1])
1680
+ newText = diffs[-1][1] + nullPadding[:extraLength]
1681
+ diffs[-1] = (diffs[-1][0], newText)
1682
+ patch.length1 += extraLength
1683
+ patch.length2 += extraLength
1684
+
1685
+ return nullPadding
1686
+
1687
+ def patch_splitMax(self, patches):
1688
+ """Look through the patches and break up any which are longer than the
1689
+ maximum limit of the match algorithm.
1690
+ Intended to be called only from within patch_apply.
1691
+
1692
+ Args:
1693
+ patches: Array of Patch objects.
1694
+ """
1695
+ patch_size = self.Match_MaxBits
1696
+ if patch_size == 0:
1697
+ # Python has the option of not splitting strings due to its ability
1698
+ # to handle integers of arbitrary precision.
1699
+ return
1700
+ for x in range(len(patches)):
1701
+ if patches[x].length1 <= patch_size:
1702
+ continue
1703
+ bigpatch = patches[x]
1704
+ # Remove the big old patch.
1705
+ del patches[x]
1706
+ x -= 1
1707
+ start1 = bigpatch.start1
1708
+ start2 = bigpatch.start2
1709
+ precontext = ''
1710
+ while len(bigpatch.diffs) != 0:
1711
+ # Create one of several smaller patches.
1712
+ patch = patch_obj()
1713
+ empty = True
1714
+ patch.start1 = start1 - len(precontext)
1715
+ patch.start2 = start2 - len(precontext)
1716
+ if precontext:
1717
+ patch.length1 = patch.length2 = len(precontext)
1718
+ patch.diffs.append((self.DIFF_EQUAL, precontext))
1719
+
1720
+ while (len(bigpatch.diffs) != 0 and
1721
+ patch.length1 < patch_size - self.Patch_Margin):
1722
+ (diff_type, diff_text) = bigpatch.diffs[0]
1723
+ if diff_type == self.DIFF_INSERT:
1724
+ # Insertions are harmless.
1725
+ patch.length2 += len(diff_text)
1726
+ start2 += len(diff_text)
1727
+ patch.diffs.append(bigpatch.diffs.pop(0))
1728
+ empty = False
1729
+ elif (diff_type == self.DIFF_DELETE and len(patch.diffs) == 1 and
1730
+ patch.diffs[0][0] == self.DIFF_EQUAL and
1731
+ len(diff_text) > 2 * patch_size):
1732
+ # This is a large deletion. Let it pass in one chunk.
1733
+ patch.length1 += len(diff_text)
1734
+ start1 += len(diff_text)
1735
+ empty = False
1736
+ patch.diffs.append((diff_type, diff_text))
1737
+ del bigpatch.diffs[0]
1738
+ else:
1739
+ # Deletion or equality. Only take as much as we can stomach.
1740
+ diff_text = diff_text[:patch_size - patch.length1 -
1741
+ self.Patch_Margin]
1742
+ patch.length1 += len(diff_text)
1743
+ start1 += len(diff_text)
1744
+ if diff_type == self.DIFF_EQUAL:
1745
+ patch.length2 += len(diff_text)
1746
+ start2 += len(diff_text)
1747
+ else:
1748
+ empty = False
1749
+
1750
+ patch.diffs.append((diff_type, diff_text))
1751
+ if diff_text == bigpatch.diffs[0][1]:
1752
+ del bigpatch.diffs[0]
1753
+ else:
1754
+ bigpatch.diffs[0] = (bigpatch.diffs[0][0],
1755
+ bigpatch.diffs[0][1][len(diff_text):])
1756
+
1757
+ # Compute the head context for the next patch.
1758
+ precontext = self.diff_text2(patch.diffs)
1759
+ precontext = precontext[-self.Patch_Margin:]
1760
+ # Append the end context for this patch.
1761
+ postcontext = self.diff_text1(bigpatch.diffs)[:self.Patch_Margin]
1762
+ if postcontext:
1763
+ patch.length1 += len(postcontext)
1764
+ patch.length2 += len(postcontext)
1765
+ if len(patch.diffs) != 0 and patch.diffs[-1][0] == self.DIFF_EQUAL:
1766
+ patch.diffs[-1] = (self.DIFF_EQUAL, patch.diffs[-1][1] +
1767
+ postcontext)
1768
+ else:
1769
+ patch.diffs.append((self.DIFF_EQUAL, postcontext))
1770
+
1771
+ if not empty:
1772
+ x += 1
1773
+ patches.insert(x, patch)
1774
+
1775
+ def patch_toText(self, patches):
1776
+ """Take a list of patches and return a textual representation.
1777
+
1778
+ Args:
1779
+ patches: Array of Patch objects.
1780
+
1781
+ Returns:
1782
+ Text representation of patches.
1783
+ """
1784
+ text = []
1785
+ for patch in patches:
1786
+ text.append(str(patch))
1787
+ return "".join(text)
1788
+
1789
+ def patch_fromText(self, textline):
1790
+ """Parse a textual representation of patches and return a list of patch
1791
+ objects.
1792
+
1793
+ Args:
1794
+ textline: Text representation of patches.
1795
+
1796
+ Returns:
1797
+ Array of Patch objects.
1798
+
1799
+ Raises:
1800
+ ValueError: If invalid input.
1801
+ """
1802
+ patches = []
1803
+ if not textline:
1804
+ return patches
1805
+ text = textline.split('\n')
1806
+ while len(text) != 0:
1807
+ m = re.match(r"^@@ -(\d+),?(\d*) \+(\d+),?(\d*) @@$", text[0])
1808
+ if not m:
1809
+ raise ValueError("Invalid patch string: " + text[0])
1810
+ patch = patch_obj()
1811
+ patches.append(patch)
1812
+ patch.start1 = int(m.group(1))
1813
+ if m.group(2) == '':
1814
+ patch.start1 -= 1
1815
+ patch.length1 = 1
1816
+ elif m.group(2) == '0':
1817
+ patch.length1 = 0
1818
+ else:
1819
+ patch.start1 -= 1
1820
+ patch.length1 = int(m.group(2))
1821
+
1822
+ patch.start2 = int(m.group(3))
1823
+ if m.group(4) == '':
1824
+ patch.start2 -= 1
1825
+ patch.length2 = 1
1826
+ elif m.group(4) == '0':
1827
+ patch.length2 = 0
1828
+ else:
1829
+ patch.start2 -= 1
1830
+ patch.length2 = int(m.group(4))
1831
+
1832
+ del text[0]
1833
+
1834
+ while len(text) != 0:
1835
+ if text[0]:
1836
+ sign = text[0][0]
1837
+ else:
1838
+ sign = ''
1839
+ line = urllib.parse.unquote(text[0][1:])
1840
+ if sign == '+':
1841
+ # Insertion.
1842
+ patch.diffs.append((self.DIFF_INSERT, line))
1843
+ elif sign == '-':
1844
+ # Deletion.
1845
+ patch.diffs.append((self.DIFF_DELETE, line))
1846
+ elif sign == ' ':
1847
+ # Minor equality.
1848
+ patch.diffs.append((self.DIFF_EQUAL, line))
1849
+ elif sign == '@':
1850
+ # Start of next patch.
1851
+ break
1852
+ elif sign == '':
1853
+ # Blank line? Whatever.
1854
+ pass
1855
+ else:
1856
+ # WTF?
1857
+ raise ValueError("Invalid patch mode: '%s'\n%s" % (sign, line))
1858
+ del text[0]
1859
+ return patches
1860
+
1861
+
1862
+ class patch_obj:
1863
+ """Class representing one patch operation.
1864
+ """
1865
+
1866
+ def __init__(self):
1867
+ """Initializes with an empty list of diffs.
1868
+ """
1869
+ self.diffs = []
1870
+ self.start1 = None
1871
+ self.start2 = None
1872
+ self.length1 = 0
1873
+ self.length2 = 0
1874
+
1875
+ def __str__(self):
1876
+ """Emulate GNU diff's format.
1877
+ Header: @@ -382,8 +481,9 @@
1878
+ Indices are printed as 1-based, not 0-based.
1879
+
1880
+ Returns:
1881
+ The GNU diff string.
1882
+ """
1883
+ if self.length1 == 0:
1884
+ coords1 = str(self.start1) + ",0"
1885
+ elif self.length1 == 1:
1886
+ coords1 = str(self.start1 + 1)
1887
+ else:
1888
+ coords1 = str(self.start1 + 1) + "," + str(self.length1)
1889
+ if self.length2 == 0:
1890
+ coords2 = str(self.start2) + ",0"
1891
+ elif self.length2 == 1:
1892
+ coords2 = str(self.start2 + 1)
1893
+ else:
1894
+ coords2 = str(self.start2 + 1) + "," + str(self.length2)
1895
+ text = ["@@ -", coords1, " +", coords2, " @@\n"]
1896
+ # Escape the body of the patch with %xx notation.
1897
+ for (op, data) in self.diffs:
1898
+ if op == diff_match_patch.DIFF_INSERT:
1899
+ text.append("+")
1900
+ elif op == diff_match_patch.DIFF_DELETE:
1901
+ text.append("-")
1902
+ elif op == diff_match_patch.DIFF_EQUAL:
1903
+ text.append(" ")
1904
+ # High ascii will raise UnicodeDecodeError. Use Unicode instead.
1905
+ data = data.encode("utf-8")
1906
+ text.append(urllib.parse.quote(data, "!~*'();/?:@&=+$,# ") + "\n")
1907
+ return "".join(text)