Dramatically optimize algorithm in the common case by excluding match… #89

epatey · 2018-03-18T22:10:10Z

…ing heads and tails before using LCS. For example, in the case of single insert, the algorithm changes from O(m*n) to O(m+n). When the arrays contain 1,000 entries, for example, this change reduces the number of comparisons ~1,000,000 to ~2,000 and the size of the table used by the algorithm from ~1,000,000 to 2.

epatey · 2018-03-19T14:06:31Z

I haven't had a chance to actually measure the perf difference, but you could replace

let matchingHeadCount = zip(lhs, rhs).prefix() { $0.0 == $0.1 }.map() { $0.0 }.count
let matchingTailCount = matchingHeadCount == minTotalCount
    ? 0 // if the matching head consumed all of either of the arrays, there's no tail
    : zip(lhs.reversed(), rhs.reversed()).prefix(minTotalCount - matchingHeadCount).prefix() { $0.0 == $0.1 }.reversed().map() { $0.0 }.count

with something like:

let (matchingHeadCount, matchingTailCount) = lhs.withUnsafeBufferPointer { unsafeLHS in
    return rhs.withUnsafeBufferPointer { unsafeRHS -> (Int, Int) in
        var mhc = minTotalCount
        for i in  0..<minTotalCount {
            if (unsafeLHS[i] != unsafeRHS[i]) {
                mhc = i
                break
            }
        }
        
        let maxPossibleTail = minTotalCount - mhc
        if (maxPossibleTail < 1) {
            return (mhc, 0)
        }
        
        var mtc = maxPossibleTail
        for i in  0..<maxPossibleTail {
            if (unsafeLHS[unsafeLHS.endIndex - i - 1] != unsafeRHS[unsafeRHS.endIndex - i - 1]) {
                mtc = i
                break
            }
        }
        
        return (mhc, mtc)
    }
}

if it makes a perf difference.

epatey · 2018-03-20T14:59:19Z

So I did the perf test. After I added the proper .lazy's to the sequences, the perf difference between the sequence oriented approach and the unsafeBufferPointer approach is pretty small, and the sequence approach is way more readable/concise.

I still don't have an instinct around what operations drop you out of the lazy domain.

jflinter

@epatey - left a couple of comments. Overall this is really nice, thank you!

jflinter · 2018-04-02T21:21:47Z

Dwifft/Dwifft.swift

 /// Namespace for the `diff` and `apply` functions.
 public enum Dwifft {

+    internal static func matchingEndsInfo<Value: Equatable>(_ lhs: [Value], _ rhs: [Value]) -> (Int, ArraySlice<Value>, ArraySlice<Value>) {
+        let minTotalCount = min(lhs.count, rhs.count)
+        let matchingHeadCount = zip(lhs, rhs).lazy.prefix() { $0.0 == $0.1 }.count()


These two lines feel a bit too clever for their own good - it's hard for me to understand what they're doing at a quick glance. I think using a couple of plain old for loops would be preferable here.

jflinter · 2018-04-02T21:22:48Z

Dwifft/Dwifft.swift

 /// Namespace for the `diff` and `apply` functions.
 public enum Dwifft {

+    internal static func matchingEndsInfo<Value: Equatable>(_ lhs: [Value], _ rhs: [Value]) -> (Int, ArraySlice<Value>, ArraySlice<Value>) {


Even though it's an internal function, can you please add a docstring to this method to help with future debugging etc?

jflinter · 2018-04-02T21:24:03Z

Dwifft/Dwifft.swift

    /// Returns the sequence of `DiffStep`s required to transform one array into another.
    ///
    /// - Parameters:
    ///   - lhs: an array
    ///   - rhs: another, uh, array
    /// - Returns: the series of transformations that, when applied to `lhs`, will yield `rhs`.
    public static func diff<Value: Equatable>(_ lhs: [Value], _ rhs: [Value]) -> [DiffStep<Value>] {
+        let (matchingHeadCount, lhs, rhs) = matchingEndsInfo(lhs, rhs)


Nitpicking, but can you not shadow the lhs and rhs variable names here? Made it harder for me to understand how this compiled at first glance...

epatey mentioned this pull request Mar 19, 2018

Proposed significant optimization for common cases #88

Open

Keep sequences lazy for performance of matchingEndsInfo.

82ee099

jflinter requested changes Apr 2, 2018

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dramatically optimize algorithm in the common case by excluding match… #89

Dramatically optimize algorithm in the common case by excluding match… #89

epatey commented Mar 18, 2018

epatey commented Mar 19, 2018

epatey commented Mar 20, 2018

jflinter left a comment

jflinter Apr 2, 2018

jflinter Apr 2, 2018

jflinter Apr 2, 2018

Dramatically optimize algorithm in the common case by excluding match… #89

Are you sure you want to change the base?

Dramatically optimize algorithm in the common case by excluding match… #89

Conversation

epatey commented Mar 18, 2018

epatey commented Mar 19, 2018

epatey commented Mar 20, 2018

jflinter left a comment

Choose a reason for hiding this comment

jflinter Apr 2, 2018

Choose a reason for hiding this comment

jflinter Apr 2, 2018

Choose a reason for hiding this comment

jflinter Apr 2, 2018

Choose a reason for hiding this comment