-
Notifications
You must be signed in to change notification settings - Fork 122
Use BOW values projected to dense as stub embeddings for test #80
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm concerned about the memory usage of the random_matrix
.
Isn't there some simpler solution that doesn't require pre-allocating such a huge matrix?
self.dimension = dimension | ||
rng = np.random.default_rng(seed) | ||
self.random_matrix = rng.standard_normal((self.input_dim, self.dimension)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait - isn't that a HUGE shape??
It's a dense matrix, isn't it? Won't it take a lot of memory?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed the need in actually holding this matrix, for the default dimensions it's few MB but for output dim 1536 it can become larger than we would like. Anyway, we not even holding a matrix any more
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LG(engouh)TM
Problem
Current stub encoder is flaky and sometimes puts very different texts very close
Also, it's not putting similar text together, only the same text on itself
Solution
Implement stub dense encoder that project bag of words sparse vector to a dense vector
Type of Change
Test Plan
Part of the tests framework itself