String concatenation in a loop slow compared to Python #559

willstudy · 2024-05-21T03:36:55Z

test code

import time

def test_f_string_format_performance(num_entries: int) -> float:
    start_time = time.time()
    for i in range(num_entries):
        formatted_string = f"Number {i}: {i * 2}"
    return time.time() - start_time

def test_string_concatenation_performance(num_entries: int) -> float:
    start_time = time.time()
    result = ""
    for i in range(num_entries):
        result += f"Number {i}: {i * 2}\n"
    return time.time() - start_time


num_entries = 200000 

format_time = test_f_string_format_performance(num_entries)
print(f"f-string format operation time: {format_time:.6f} seconds")

concatenation_time = test_string_concatenation_performance(num_entries)
print(f"String concatenation operation time: {concatenation_time:.6f} seconds")

codon test run

(base) Y6RC26NQM4:press bytedance$ ./build/codon build -release -exe string.py
(base) Y6RC26NQM4:press bytedance$ ./string
f-string format operation time: 0.134792 seconds
String concatenation operation time: 25.759953 seconds

python3 test run

(base) Y6RC26NQM4:press bytedance$ python3 string.py
f-string format operation time: 0.021931 seconds
String concatenation operation time: 0.032574 seconds

The text was updated successfully, but these errors were encountered:

willstudy · 2024-05-21T03:39:56Z

@arshajii thanks for help

arshajii · 2024-05-21T18:19:46Z

After some researching it seems that Python string objects have an optimization where they check if their reference count is 1 before concatenation, and if so they do a realloc in-place instead of creating a new object, which effectively skips the long memcpy operation.

It's not as simple in Codon since we don't keep reference counts (instead we use a GC). It should be possible to still determine if an object has just a single reference via the GC, but we would need to do benchmarking to make sure it doesn't introduce performance problems in other scenarios.

In the meantime, there is an internal string buffer type _strbuf that is used in the standard library, which can be used here. It makes the runtime 25% faster than CPython (3.11) on my machine:

def test_string_concatenation_performance(num_entries: int) -> float:
    start_time = time.time()
    result = _strbuf()    # <-- use strbuf object
    for i in range(num_entries):
        result.append(f"Number {i}: {i * 2}\n")
    result = str(result)  # <-- convert strbuf to string
    return time.time() - start_time

I'll keep this issue open until we figure out a general solution.

willstudy · 2024-05-22T08:49:59Z

@arshajii add __iadd__ function for str class？

for example：

def __iadd__(self, other: str) -> str:
        len1 = self.len
        len2 = other.len
        len3 = len1 + len2

       self.len = len3
       self.ptr = realloc(self.ptr, len3, len1)
       str.memcpy(self.ptr + len1, other.ptr, len2)
       return self

How about this solution? but str is tuple and cannot be modified

inumanag · 2024-09-23T05:08:42Z

Does it work if you do:

@extend 
class str:
   def __iadd__(self, other: str):
       len1 = self.len
        len2 = other.len
        len3 = len1 + len2

       self.len = len3
       self.ptr = realloc(self.ptr, len3, len1)
       str.memcpy(self.ptr + len1, other.ptr, len2)
       return self

I think it should work. Let me know if it does not.

arshajii changed the title ~~Why does the performance of string operations become very poor after being statically compiled by Codon?~~ String concatenation in a loop slow compared to Python May 21, 2024

inumanag assigned arshajii Sep 23, 2024

inumanag self-assigned this Sep 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

String concatenation in a loop slow compared to Python #559

String concatenation in a loop slow compared to Python #559

willstudy commented May 21, 2024 •

edited

Loading

willstudy commented May 21, 2024

arshajii commented May 21, 2024

willstudy commented May 22, 2024 •

edited

Loading

inumanag commented Sep 23, 2024

String concatenation in a loop slow compared to Python #559

String concatenation in a loop slow compared to Python #559

Comments

willstudy commented May 21, 2024 • edited Loading

willstudy commented May 21, 2024

arshajii commented May 21, 2024

willstudy commented May 22, 2024 • edited Loading

inumanag commented Sep 23, 2024

willstudy commented May 21, 2024 •

edited

Loading

willstudy commented May 22, 2024 •

edited

Loading