Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add scan command #199

Closed
wants to merge 5 commits into from
Closed

add scan command #199

wants to merge 5 commits into from

Conversation

NaNShaner
Copy link
Contributor

add scan command

@NaNShaner
Copy link
Contributor Author

@HDT3213 貌似是raft这块的逻辑

@@ -412,6 +412,91 @@ func execCopy(mdb *Server, conn redis.Connection, args [][]byte) redis.Reply {
return protocol.MakeIntReply(1)
}

// execScan iteratively output all keys in the current db
func execScan(db *DB, args [][]byte) redis.Reply {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

scan 的参数有 MATCH,COUNT, TYPE 3个,实际执行 scan 的函数签名应该是 scan0(cursor, pattern, count, typ)。
现在的解析命令行逻辑过于晦涩, 可以参考一下 execSet 的实现

if count >= size {
return dict.Keys(), nextCursor
}
remainingKeys := dict.table[cursor:]
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

table 中取出来的是 shard, 为啥起名叫 remainingKeys?

@@ -435,3 +436,56 @@ func (dict *ConcurrentDict) RWUnLocks(writeKeys []string, readKeys []string) {
}
}
}

// ScanKeys iteratively output all keys in the current db
func (dict *ConcurrentDict) ScanKeys(cursor, count int, matchKey string) ([]string, int) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

matchKey -> pattern

for key, _ := range s.m {
if key != "" {
if matchKey != "*" {
pattern, err := wildcard.CompilePattern(matchKey)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CompilePattern 执行一次就可以了。。

nextCursor := 0
size := dict.Len()
result := make([]string, count)
r := make(map[string]struct{})
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

不要使用过于简单的变量名

i++
}
}
if count <= i {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

scan 命令不保证返回的元素数一定等于 count 参数,但是保证从遍历开始直到完整遍历结束期间, 一直存在于数据集内的所有元素都会被返回。如果到了 count 就中止的话,当前shard 中可能有一些 key 在遍历期间一直存在但因为遍历中止未被返回。

所以应完整遍历shard, 当 len(result) >= count 后返回即可。

for _, shard := range shards {
    for k, _ := range shard.m {
          result.add(key)
    } 
    if len(result) >= count {
          break
    }
} 

@HDT3213
Copy link
Owner

HDT3213 commented Jul 29, 2023

两个大问题

  1. 解析命令行逻辑过于晦涩, 可以参考一下 execSet 的实现

  2. scan 命令不保证返回的元素数一定等于 count 参数,但是保证: 「从遍历开始直到完整遍历结束期间, 一直存在于数据集内的所有元素都会被返回」。如果到了 count 就中止的话,当前shard 中可能有一些 key 在遍历期间一直存在但因为遍历中止未被返回。

所以应完整遍历shard, 当 len(result) >= count 后返回即可。

for _, shard := range shards {
    for k, _ := range shard.m {
          result.add(key)
    } 
    if len(result) >= count {
          break
    }
} 

return dict.Keys(), nextCursor
}
remainingKeys := dict.table[cursor:]
i := 0
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

直接用 len(r) 不就行了

@NaNShaner
Copy link
Contributor Author

按照您的建议做了一些修改,貌似test failed。略有些不解🤔。

@HDT3213
Copy link
Owner

HDT3213 commented Aug 9, 2023

=== RUN   TestScan


    keys_test.go:332: test failed




--- FAIL: TestScan (0.05s)

新加的 TestScan 挂了

@NaNShaner
Copy link
Contributor Author

我本地跑是正常的

image

@NaNShaner
Copy link
Contributor Author

知道了, 每次scan迭代返回的key是无序的,所以在test比对的时候会偶发不一致。

database/keys.go Outdated
return protocol.MakeNullBulkReply()
}

func scanWithArg(db *DB, args [][]byte, match bool) redis.Reply {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

函数签名改成: execScan0(cursor, pattern, count, typ)

}

// execScanWithArg execute scan command based on cli args
func execScanWithArg(db *DB, args [][]byte, argType int) redis.Reply {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

函数签名改成 execScan0(cursor string, pattern string, count int)
type 参数可以先不加

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

上次说的 type 不是参数类型, scan 命令可以根据 string/list/hash/set/sortedset 类型进行过滤.


// execScan iteratively output all keys in the current db
func execScan(db *DB, args [][]byte) redis.Reply {
const (
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

var count int
var pattern string
for i := 1; i < argsNum; i++ {
	arg := strings.ToLower(string(args[i]))
	if arg == "count" {
              count = itoa(args[i+1])
              i++
        }
        ....
}
execScan0(cursor, pattern, count)

这样不就完事了,为啥写这么麻烦

i++
}
}
if len(m) >= count {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#199 (comment)

遍历完 shard 再返回

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里有个问题没想明白,如果每次都需要把所有的shard遍历完再返回的话
1、性能上是不是存在而外的开销?
2、每次遍历以count参数的个数作为一个取值范围,直至遍历完成,不也可以保证保证数据的全量被遍历到么?(不保证已经被遍历过的shard发生数据变化)

@NaNShaner NaNShaner closed this by deleting the head repository May 5, 2024
@lhpqaq lhpqaq mentioned this pull request Jul 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants