Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Theoretical integer overflow in StringArrayBuilder / LargeStringArrayBuilder #13796

Closed
alamb opened this issue Dec 16, 2024 · 3 comments · Fixed by #13802
Closed

Theoretical integer overflow in StringArrayBuilder / LargeStringArrayBuilder #13796

alamb opened this issue Dec 16, 2024 · 3 comments · Fixed by #13802
Labels
enhancement New feature or request

Comments

@alamb
Copy link
Contributor

alamb commented Dec 16, 2024

Is your feature request related to a problem or challenge?

Similarly to #13759, we found another potential code vulnerability from a security audit performed by InfluxData

The public method with_capacity of StringArrayBuilder accepts a parameter item_capacity: usize. It is used to create a MutableBuffer with a capacity calculated as (item_capacity + 1)*size_of::<i32>(), which can overflow undetected, leaving the buffer
too small. The subsequent unsafe call to push a value into the buffer can lead to an out-of-bounds memory access.

In datafusion/functions/src/strings.rs, lines 125ff:

impl StringArrayBuilder {
    pub fn with_capacity(item_capacity: usize, data_capacity: usize) -> Self {
        let mut offsets_buffer = MutableBuffer::with_capacity((item_capacity + 1) * size_of::<i32>());
        // SAFETY: the first offset value is definitely not going to exceed the bounds.
        unsafe { offsets_buffer.push_unchecked(0_i32) };
        Self {
            offsets_buffer,
            value_buffer:
            MutableBuffer::with_capacity(data_capacity),
        }
    }

I analyzed the potential risk, and I agree there is a risk of memory unsafety but I do not think it is exploitable via DataFusion APIs. Specifically, the only callsites are:

let mut builder = StringArrayBuilder::with_capacity(len, data_size);

The argument is taken from

// Array
let len = array_len.unwrap();

So to trigger this code you would have to provide an input record batch with more thanu32::MAX rows

Reproducer showing segfault:

Here is a test case I wrote:

andrewlamb@Andrews-MacBook-Pro-2:~/Software/datafusion$ git diff
diff --git a/datafusion/functions/src/strings.rs b/datafusion/functions/src/strings.rs
index d2fb5d585..8e7c05fa1 100644
--- a/datafusion/functions/src/strings.rs
+++ b/datafusion/functions/src/strings.rs
@@ -422,3 +422,13 @@ impl ColumnarValueRef<'_> {
         }
     }
 }
+
+
+#[cfg(test)]
+mod test {
+    use super::*;
+    #[test]
+    fn test_overflow() {
+        let builder = StringArrayBuilder::with_capacity(usize::MAX, usize::MAX);
+    }
+}

It must be run in release mode

cargo test --release  -p datafusion-functions --lib -- overflow
error: test failed, to rerun pass `-p datafusion-functions --lib`

Caused by:
  process didn't exit successfully: `/Users/andrewlamb/Software/datafusion/target/release/deps/datafusion_functions-e3de98ad3d4dc27c overflow` (signal: 11, SIGSEGV: invalid memory reference)
andrewlamb@Andrews-MacBook-Pro-2:~/Software/datafusion$

Describe the solution you'd like

Recommendations: Check for integer overflow when calculating the capacity.

Describe alternatives you've considered

No response

Additional context

No response

@alamb alamb added the enhancement New feature or request label Dec 16, 2024
@alamb
Copy link
Contributor Author

alamb commented Dec 16, 2024

Note the same thing impacts LargeStringArrayBuilder as well

@alamb
Copy link
Contributor Author

alamb commented Dec 16, 2024

I think we can probably change this from

(item_capacity + 1) * size_of::<i32>()

To something like

(item_capacity + 1).checked_mul(size_of::<i32>())

@alamb alamb changed the title Theoretical integer overflow Theoretical integer overflow in StringArrayBuilder / LargeStringArrayBuilder Dec 16, 2024
@wiedld
Copy link
Contributor

wiedld commented Dec 16, 2024

take

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
2 participants