-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Check if sorting implementation is stable #177
Conversation
Can you avoid modifying the definition of |
Absolutely! I've found an alternative approach that doesn't require modifying the definition of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shorten the code.
qtest.c
Outdated
/* If the number of elements is too large, it may take a long time to check the | ||
* stability of the sort. So, MAX_NODE is used to limit the number of elements | ||
* to check the stability of the sort. */ | ||
#define MAX_NODE 100000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Undefine MAX_NODE
when it is not used. Also, rename it to MAX_NODES
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Explain why 100000 was picked in a scientific way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the previous implementation, the time complexity for comparing each pair of duplicated strings was
In the test traces/trace-14-perf.cmd
, the maximum number of nodes and the sorting command script used is 2,000,000. In the traces/trace-15-perf.cmd
, the second highest number of nodes used with the sort command is 100,000. Therefore, I set MAX_NODES
to 100,000, as I made a mistake in setting it to cover the second highest case. Setting MAX_NODES
to 2,000,000 would cause a segmentation fault on my computer, so I opted to skip that case. As a result, I measured the time taken using the sort command in various scenarios with different numbers of duplicated nodes.
The test script is as follows. I set MAX_NODES
to 1,000,001 for the test. I will only change the number of nodes inserted at the head, and then use perf stat ./qtest < test.cmd
to measure the time.
test.cmd
new
ih a 10000
sort
quit
node count | elapsed time (seconds) |
---|---|
1000 | 0.027906238 |
10000 | 0.058188249 |
100000 | 2.445569385 |
150000 | 5.453358783 |
200000 | 9.671944194 |
300000 | 21.581793918 |
400000 | 40.176436261 |
500000 | 61.437037665 |
As shown above, exceeding 100,000 nodes in the test would lead to significant performance degradation🥹. While alternative approaches might be considered, I found that in the end, it was more straightforward to directly add a member to element_t
. However, this approach deviates from the requirements of the assignment. Therefore, it involved a trade-off in implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the reasons provided above are deemed sufficient, I will create another commit to change MAX_NODES = 100000
to MAX_NODES = 100001
, and will explain the rationale in the commit message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the reasons provided above are deemed sufficient, I will create another commit to change
MAX_NODES = 100000
toMAX_NODES = 100001
, and will explain the rationale in the commit message.
Use git rebase -i
to rework the commits.
d86d3ec
to
1faadd8
Compare
Thank you for your review! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Squash the commits and refine the git commit messages.
- Ensure the proposed change fits both ascending and descending order once a user specifies via "option" command.
I ensure that this approach enables stable detection for both ascending and descending sorts. Besides my own successful testing, this proposed change only examines adjacent nodes with identical values and compares their indices before |
/* If the number of elements is too large, it may take a long time to check the | ||
* stability of the sort. So, MAX_NODES is used to limit the number of elements | ||
* to check the stability of the sort. */ | ||
#define MAX_NODES 100000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you use the sliding window technique to track partial nodes instead of using a predefined number of nodes during a customized sorting routine?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we were to use the sliding window technique, how would we ensure the relative order of nodes? Without additional data structures, the information about the relative order of nodes would be lost after sorting. Can the sliding window still be utilized under these circumstances, or have I misunderstood your suggestion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we were to use the sliding window technique, how would we ensure the relative order of nodes? Without additional data structures, the information about the relative order of nodes would be lost after sorting. Can the sliding window still be utilized under these circumstances, or have I misunderstood your suggestion?
Think of the facility of queue operations. We are ready to allocate the temporary nodes on demand. What I care is the fixed length for checking purpose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we're considering avoiding the fixed length for checking purposes, I'm thinking of trying to use the node's address as the key for hashing. With an additional data structure, we can determine the original order of the queue. This approach can handle a large amount of test data efficiently due to the properties of hashing, with reduced time complexity. However, we'll need to decide on a fixed array length if we're using an array as a hash table. I'm a bit stuck on this point.
192ae40
to
12be675
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ensure that you write complete sentences in git commit messages.
Introduced node numbering in qtest.c to evaluate the stability of q_sort's sorting algorithm. When the algorithm encounters two nodes with the same value, it searches for the address of the node record in the nodes array. It then compares the found node to the current node (cur). If the found node is the same as the current node, it indicates that these two duplicate nodes have not been swapped in position after sorting. However, if the found node is cur->next, it means that the position of the nodes has been swapped. That is, the sorting implementation is unstable. The performance of the testing code was evaluated by measuring the elapsed time for q_sort's operation on different numbers of nodes with duplicate values. Node counts ranging from 1000 to 500,000 were examined. Specifically, for the 1000-node count, the elapsed time was recorded as 0.0279 seconds, and for the 500,000-node count, it was 61.44 seconds. For the 100,000-node count, the elapsed time was 2.45 seconds. The elapsed time showed a significant increase starting from the 100,000-node count, underscoring potential performance issues with larger datasets. This method relies on auxiliary data structures to track node pointers and their original order, avoiding alterations to the structure in queue.h. However, stability testing is limited to a maximum of 100,000 elements (MAX_NODES) to address potential performance concerns.
12be675
to
b0aa080
Compare
Thank @yenslife for contributing! |
Introduced node numbering in qtest.c to evaluate q_sort's stability. The modification assigns unique identifiers to each node, facilitating stability checks during sorting operations. If nodes with identical key values are found out of ascending order, an error is reported to maintain stable sorting.