Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dramatically optimize algorithm in the common case by excluding match… #89

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

epatey
Copy link

@epatey epatey commented Mar 18, 2018

…ing heads and tails before using LCS. For example, in the case of single insert, the algorithm changes from O(m*n) to O(m+n). When the arrays contain 1,000 entries, for example, this change reduces the number of comparisons ~1,000,000 to ~2,000 and the size of the table used by the algorithm from ~1,000,000 to 2.

…ing heads and tails before using LCS. For example, in the case of single insert, the algorithm changes from O(m*n) to O(m+n). When the arrays contain 1,000 entries, for example, this change reduces the number of comparisons ~1,000,000 to ~2,000 and the size of the table used by the algorithm from ~1,000,000 to 2.
@epatey
Copy link
Author

epatey commented Mar 19, 2018

I haven't had a chance to actually measure the perf difference, but you could replace

let matchingHeadCount = zip(lhs, rhs).prefix() { $0.0 == $0.1 }.map() { $0.0 }.count
let matchingTailCount = matchingHeadCount == minTotalCount
    ? 0 // if the matching head consumed all of either of the arrays, there's no tail
    : zip(lhs.reversed(), rhs.reversed()).prefix(minTotalCount - matchingHeadCount).prefix() { $0.0 == $0.1 }.reversed().map() { $0.0 }.count

with something like:

let (matchingHeadCount, matchingTailCount) = lhs.withUnsafeBufferPointer { unsafeLHS in
    return rhs.withUnsafeBufferPointer { unsafeRHS -> (Int, Int) in
        var mhc = minTotalCount
        for i in  0..<minTotalCount {
            if (unsafeLHS[i] != unsafeRHS[i]) {
                mhc = i
                break
            }
        }
        
        let maxPossibleTail = minTotalCount - mhc
        if (maxPossibleTail < 1) {
            return (mhc, 0)
        }
        
        var mtc = maxPossibleTail
        for i in  0..<maxPossibleTail {
            if (unsafeLHS[unsafeLHS.endIndex - i - 1] != unsafeRHS[unsafeRHS.endIndex - i - 1]) {
                mtc = i
                break
            }
        }
        
        return (mhc, mtc)
    }
}

if it makes a perf difference.

@epatey
Copy link
Author

epatey commented Mar 20, 2018

So I did the perf test. After I added the proper .lazy's to the sequences, the perf difference between the sequence oriented approach and the unsafeBufferPointer approach is pretty small, and the sequence approach is way more readable/concise.

I still don't have an instinct around what operations drop you out of the lazy domain.

Copy link
Owner

@jflinter jflinter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@epatey - left a couple of comments. Overall this is really nice, thank you!

/// Namespace for the `diff` and `apply` functions.
public enum Dwifft {

internal static func matchingEndsInfo<Value: Equatable>(_ lhs: [Value], _ rhs: [Value]) -> (Int, ArraySlice<Value>, ArraySlice<Value>) {
let minTotalCount = min(lhs.count, rhs.count)
let matchingHeadCount = zip(lhs, rhs).lazy.prefix() { $0.0 == $0.1 }.count()
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two lines feel a bit too clever for their own good - it's hard for me to understand what they're doing at a quick glance. I think using a couple of plain old for loops would be preferable here.

/// Namespace for the `diff` and `apply` functions.
public enum Dwifft {

internal static func matchingEndsInfo<Value: Equatable>(_ lhs: [Value], _ rhs: [Value]) -> (Int, ArraySlice<Value>, ArraySlice<Value>) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even though it's an internal function, can you please add a docstring to this method to help with future debugging etc?

/// Returns the sequence of `DiffStep`s required to transform one array into another.
///
/// - Parameters:
/// - lhs: an array
/// - rhs: another, uh, array
/// - Returns: the series of transformations that, when applied to `lhs`, will yield `rhs`.
public static func diff<Value: Equatable>(_ lhs: [Value], _ rhs: [Value]) -> [DiffStep<Value>] {
let (matchingHeadCount, lhs, rhs) = matchingEndsInfo(lhs, rhs)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpicking, but can you not shadow the lhs and rhs variable names here? Made it harder for me to understand how this compiled at first glance...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants