Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] std::string::replace("aa", "aa", "") returns "a" #1408

Closed
1 task done
arcusmaximus opened this issue Nov 9, 2023 · 8 comments
Closed
1 task done

[Bug] std::string::replace("aa", "aa", "") returns "a" #1408

arcusmaximus opened this issue Nov 9, 2023 · 8 comments
Labels
bug Something isn't working

Comments

@arcusmaximus
Copy link

arcusmaximus commented Nov 9, 2023

Operating System

Windows

What's the issue you encountered?

std::string::replace() doesn't give the expected result even in quite simple cases.

How can the issue be reproduced?

#include <std/string.pat>
std::print(std::string::replace("aa", "aa", ""));

a
#include <std/string.pat>
std::print(std::string::replace("ResourceType::TEXTURE", "ResourceType::", ""));

Type::TEXTURE
#include <std/string.pat>
std::print(std::string::replace("ResourceType::MODEL", "ResourceType::", ""));

ceType::MODEL

ImHex Version

1.31.0 d12f501

ImHex Build Type

  • Nightly or built from sources

Installation type

Portable

Additional context?

No response

@arcusmaximus arcusmaximus added the bug Something isn't working label Nov 9, 2023
@arcusmaximus
Copy link
Author

The definition of std::string::replace has the following line at the end:

result = result + std::string::substr(string,string_len-pattern_len+1,pattern_len);

This should be:

result = result + std::string::substr(string,i,string_len-i);

Might also want to add a check against pattern_len == 0 at the start as this currently results in an endless loop.

@paxcut
Copy link
Contributor

paxcut commented Nov 10, 2023

This error occurs only the length of the string to be matched is longer than the length of the string obtained by removing all matching string patterns. While it is true that the proposed code change fixes that case, it does so in a rather obscure way.

The issue us that the size of the pattern to match advances index i by the matched pattern size which is used to check the condition to terminate the loop but not to calculate the position and length of the remaining characters in input string. To fix it define a new variable called remaining_len and initialize it using the length of the string. As index i moves forward, remaining_len moves backwards and then used to calculate the right position and size of the remaining chars. The full code

fn replace(str string, str pattern, str replace) {
    	u32 string_len, pattern_len, replace_len, remaining_len;
	string_len  = std::string::length(string);
	pattern_len = std::string::length(pattern);
	replace_len = std::string::length(replace);
	remaining_len = string_len;
		
	if (pattern_len > string_len)
		return string;
			
	str result;
	u32 i;
	while (i <= (string_len - pattern_len)) {
			
		if (std::string::substr(string, i, pattern_len) == pattern) {
			result = result + replace;
			i = i + pattern_len;
			remaining_len = remaining_len - pattern_len;
		} else {
			result = result + std::string::at(string, i);
			i = i + 1;
		       remaining_len = remaining_len - 1;
		}
	}
	result = result + std::string::substr(string,string_len-remaining_len,remaining_len);

	return result;
};

I think this makes the code look more intentional than just using i. I tried your examples and some more using multiple match targets and they all seem to work, so I'll submit a pr if you agree.

@arcusmaximus
Copy link
Author

arcusmaximus commented Nov 10, 2023

The proposed code calculates the starting position based on the remaining length, which is rather obscure... :D

Joking aside, I don't really mind how this function is implemented, as long as it's not too inefficient and does its job. Just please also add a check on pattern_len == 0 to prevent the infinite loop.

@paxcut
Copy link
Contributor

paxcut commented Nov 10, 2023

Yes, thank you for the comment. I didnt intend to duplicate i but simply refactoring it as the remaining len. So I added the new variable and as I am writing about it i am already forgetting to remove the old one... Not trying to be obtuse on purpose, but I hate off by one errors and this was the best way i could think to make sure I don't make one again in this function. In my mind using the remaining length is a lot more clear than using string index which is why I didn't see this bug when I fixed the last issue about this ( #1279)

pattern_len ==0 , yes, thanks for that too. It makes sense that naive code will search forever for a pattern that doesn't exist. The latest version

  fn replace( str string, str pattern, str replace ) {
    	u32 string_len, pattern_len, replace_len, remaining_len;
	string_len  = std::string::length(string);
	pattern_len = std::string::length(pattern);
	replace_len = std::string::length(replace);
	remaining_len = string_len;
		
	if ( pattern_len > string_len || pattern_len == 0 )
		return string;
			
	str result;
	while ( remaining_len >= pattern_len ) {
			
		if ( std::string::substr( string, string_len - remaining_len, pattern_len ) == pattern ) {
			result = result + replace;
			remaining_len = remaining_len - pattern_len;
		} else {
			result = result + std::string::at(string, string_len - remaining_len);
		        remaining_len = remaining_len - 1;
		}
	}
	result = result + std::string::substr( string, string_len - remaining_len, remaining_len );
		
	return result;
};

@paxcut
Copy link
Contributor

paxcut commented Nov 10, 2023

I changed my mind in order to avoid code duplication. I also renamed all variables to sensible names and removed unused ones. Finally I rewrote the empty strings check to be more comprehensive. Most of the changes are based on the suggestions you made. Please take a look at the resulting code which should be as efficient as possible. If you spot something you find that can be improved or any questions/comments please let me know. This is the PR candidate

fn replace(str string, str pattern, str replace) {
    s32 string_len  = std::string::length(string);
    s32 pattern_len = std::string::length(pattern);
	
    if (pattern_len > string_len || pattern_len * string_len == 0 )
	return string;

    str result;
    s32 string_index;
    s32 remaining_len = string_len;
    while ( pattern_len <= remaining_len ) {
        if ( std::string::substr(string, string_index, pattern_len) == pattern ) {
            result += replace;
            string_index += pattern_len;
        } else {
            result += std::string::at(string, string_index);
    	    string_index += 1;
	}
        remaining_len = string_len - string_index;
    }
    result += std::string::substr(string, string_index, remaining_len );
    return result;
};

These are the tests I am checking the code with and the results

std::print("1: {}",replace("aa", "aa", ""));
std::print("2: {}",replace("ResourceType::TEXTURE", "ResourceType::", ""));
std::print("3: {}",replace("aaahijkaauvwxaaa", "aa", "bbb"));

std::print("4: {}", replace("bb", "bb", "z"));
std::print("5: {}", replace("bcd", "bc", "z"));
std::print("6: {}", replace("bcdef", "bcd", "z"));
std::print("7: {}", replace("", "", ""));
std::print("8: {}", replace("", "", "z"));
std::print("9: {}", replace("   ", " ", "z"));

std::print("10: {}", replace("abcdefg", "b", "z"));
std::print("11: {}", replace("abcdefg", "bc", "z"));
std::print("12: {}", replace("abcdefg", "bcd", "z"));
std::print("13: {}", replace("abc?*?efg", "?*?", "_z_"));
std::print("14: {}", replace("abc d efg", " ", "-"));
std::print("15: {}", replace("abc**d**efg", "**", "-+-"));


I: 1: 
I: 2: TEXTURE
I: 3: bbbahijkbbbuvwxbbba
I: 4: z
I: 5: zd
I: 6: zef
I: 7: 
I: 8: 
I: 9: zzz
I: 10: azcdefg
I: 11: azdefg
I: 12: azefg
I: 13: abc_z_efg
I: 14: abc-d-efg
I: 15: abc-+-d-+-efg
I: Pattern exited with code: 0
I: Evaluation took 0.108065s

If you can think of some tests I missed also please post a note about it.

@arcusmaximus
Copy link
Author

Looks good to me!

@iTrooz
Copy link
Collaborator

iTrooz commented Jan 18, 2024

@paxcut Can this issue be closed ?

@paxcut
Copy link
Contributor

paxcut commented Jan 18, 2024

OP seemed happy and I can't think of a test that's not checked already so I am fairly sure it can.

@iTrooz iTrooz closed this as completed Jan 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants