Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance of set_bits by avoiding to set individual bits #6288

Merged
merged 48 commits into from
Sep 15, 2024
Merged
Show file tree
Hide file tree
Changes from 24 commits
Commits
Show all changes
48 commits
Select commit Hold shift + click to select a range
6e8c864
bench
kazuyukitanimura Aug 22, 2024
a81ba56
fix: Optimize set_bits
kazuyukitanimura Aug 22, 2024
57634f2
clippy
kazuyukitanimura Aug 23, 2024
32ff203
clippyj
kazuyukitanimura Aug 23, 2024
f94f312
miri
kazuyukitanimura Aug 23, 2024
e9cd77a
fix: Optimize set_bits
kazuyukitanimura Aug 23, 2024
842c2b1
fix: Optimize set_bits
kazuyukitanimura Aug 23, 2024
06de184
fix: Optimize set_bits
kazuyukitanimura Aug 23, 2024
03b0db8
fix: Optimize set_bits
kazuyukitanimura Aug 23, 2024
7faa5f3
fix: Optimize set_bits
kazuyukitanimura Aug 23, 2024
e3d812d
fix: Optimize set_bits
kazuyukitanimura Aug 23, 2024
13dec63
fix: Optimize set_bits
kazuyukitanimura Aug 23, 2024
68cdaf2
fix: Optimize set_bits
kazuyukitanimura Aug 23, 2024
1e9de38
miri
kazuyukitanimura Aug 24, 2024
f1e1bbd
miri
kazuyukitanimura Aug 24, 2024
f294663
miri
kazuyukitanimura Aug 24, 2024
39719c4
miri
kazuyukitanimura Aug 24, 2024
9fbb87d
miri
kazuyukitanimura Aug 24, 2024
74b9d80
miri
kazuyukitanimura Aug 24, 2024
25c309e
miri
kazuyukitanimura Aug 24, 2024
7905330
miri
kazuyukitanimura Aug 24, 2024
6dd9771
miri
kazuyukitanimura Aug 24, 2024
0e956cc
miri
kazuyukitanimura Aug 24, 2024
272ecbb
miri
kazuyukitanimura Aug 24, 2024
08ebf20
address review comments
kazuyukitanimura Sep 3, 2024
d751a7f
address review comments
kazuyukitanimura Sep 3, 2024
ef2864f
address review comments
kazuyukitanimura Sep 4, 2024
e69cf9a
Revert "address review comments"
kazuyukitanimura Sep 4, 2024
b5f8bca
address review comments
kazuyukitanimura Sep 5, 2024
9c15417
address review comments
kazuyukitanimura Sep 5, 2024
533381a
address review comments
kazuyukitanimura Sep 5, 2024
dca9ab8
address review comments
kazuyukitanimura Sep 5, 2024
7f3c3fb
address review comments
kazuyukitanimura Sep 5, 2024
6ccedd2
address review comments
kazuyukitanimura Sep 6, 2024
ff2f3ca
address review comments
kazuyukitanimura Sep 6, 2024
fb46cb0
address review comments
kazuyukitanimura Sep 6, 2024
3fd5e3e
address review comments
kazuyukitanimura Sep 6, 2024
be3076e
address review comments
kazuyukitanimura Sep 6, 2024
58868c1
address review comments
kazuyukitanimura Sep 6, 2024
a15db14
address review comments
kazuyukitanimura Sep 6, 2024
fefafa7
Revert "address review comments"
kazuyukitanimura Sep 6, 2024
d8c3f08
address review comments
kazuyukitanimura Sep 6, 2024
cc5ec2b
address review comments
kazuyukitanimura Sep 6, 2024
4c39dc8
address review comments
kazuyukitanimura Sep 9, 2024
f4789be
address review comments
kazuyukitanimura Sep 9, 2024
59fd805
address review comments
kazuyukitanimura Sep 10, 2024
7d81076
address review comments
kazuyukitanimura Sep 13, 2024
f185a19
address review comments
kazuyukitanimura Sep 13, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 5 additions & 1 deletion arrow-buffer/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -44,10 +44,14 @@ rand = { version = "0.8", default-features = false, features = ["std", "std_rng"

[build-dependencies]

[[bench]]
name = "bit_mask"
harness = false

[[bench]]
name = "i256"
harness = false

[[bench]]
name = "offset"
harness = false
harness = false
58 changes: 58 additions & 0 deletions arrow-buffer/benches/bit_mask.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
// Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements. See the NOTICE file
// distributed with this work for additional information
// regarding copyright ownership. The ASF licenses this file
// to you under the Apache License, Version 2.0 (the
// "License"); you may not use this file except in compliance
// with the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing,
// software distributed under the License is distributed on an
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, either express or implied. See the License for the
// specific language governing permissions and limitations
// under the License.

use arrow_buffer::bit_mask::set_bits;
use criterion::{black_box, criterion_group, criterion_main, BenchmarkId, Criterion};

fn criterion_benchmark(c: &mut Criterion) {
kazuyukitanimura marked this conversation as resolved.
Show resolved Hide resolved
let mut group = c.benchmark_group("bit_mask");

for offset_write in (0..8).step_by(5) {
for offset_read in (0..8).step_by(5) {
for len in [1, 17, 65] {
for datum in [0u8, 0xADu8] {
let x = (offset_write, offset_read, len, datum);
group.bench_with_input(
BenchmarkId::new(
"set_bits",
format!(
"offset_write_{}_offset_read_{}_len_{}_datum_{}",
x.0, x.1, x.2, x.3
),
),
&x,
|b, &x| {
b.iter(|| {
set_bits(
black_box(&mut [0u8; 9]),
black_box(&[x.3; 9]),
black_box(x.0),
black_box(x.1),
black_box(x.2),
)
});
},
);
}
}
}
}
group.finish();
}

criterion_group!(benches, criterion_benchmark);
criterion_main!(benches);
142 changes: 117 additions & 25 deletions arrow-buffer/src/util/bit_mask.rs
Original file line number Diff line number Diff line change
Expand Up @@ -17,8 +17,7 @@

//! Utils for working with packed bit masks

use crate::bit_chunk_iterator::BitChunks;
use crate::bit_util::{ceil, get_bit, set_bit};
use crate::bit_util::ceil;

/// Sets all bits on `write_data` in the range `[offset_write..offset_write+len]` to be equal to the
/// bits in `data` in the range `[offset_read..offset_read+len]`
Expand All @@ -32,33 +31,126 @@ pub fn set_bits(
) -> usize {
let mut null_count = 0;

let mut bits_to_align = offset_write % 8;
if bits_to_align > 0 {
bits_to_align = std::cmp::min(len, 8 - bits_to_align);
let mut acc = 0;
while len > acc {
let (n, l) = set_upto_64bits(
write_data,
data,
offset_write + acc,
offset_read + acc,
len - acc,
);
null_count += n;
acc += l;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I briefly thought that this line of code said acc += 1 rather than acc += l. It may be worth having a longer/different variable name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @andygrove updated

}
let mut write_byte_index = ceil(offset_write + bits_to_align, 8);

// Set full bytes provided by bit chunk iterator (which iterates in 64 bits at a time)
let chunks = BitChunks::new(data, offset_read + bits_to_align, len - bits_to_align);
chunks.iter().for_each(|chunk| {
null_count += chunk.count_zeros();
write_data[write_byte_index..write_byte_index + 8].copy_from_slice(&chunk.to_le_bytes());
write_byte_index += 8;
});

// Set individual bits both to align write_data to a byte offset and the remainder bits not covered by the bit chunk iterator
let remainder_offset = len - chunks.remainder_len();
(0..bits_to_align)
.chain(remainder_offset..len)
.for_each(|i| {
if get_bit(data, offset_read + i) {
set_bit(write_data, offset_write + i);

null_count
}

#[inline]
fn set_upto_64bits(
write_data: &mut [u8],
data: &[u8],
offset_write: usize,
offset_read: usize,
len: usize,
kazuyukitanimura marked this conversation as resolved.
Show resolved Hide resolved
) -> (usize, usize) {
let read_byte = offset_read / 8;
let read_shift = offset_read % 8;
let write_byte = offset_write / 8;
let write_shift = offset_write % 8;

if len >= 64 {
// SAFETY: chunk gets masked when necessary, so it is safe
let chunk = unsafe { read_bytes_to_u64(data, read_byte, 8) };
kazuyukitanimura marked this conversation as resolved.
Show resolved Hide resolved
if read_shift == 0 {
if write_shift == 0 {
kazuyukitanimura marked this conversation as resolved.
Show resolved Hide resolved
let len = 64;
let null_count = chunk.count_zeros() as usize;
write_u64_bytes(write_data, write_byte, chunk);
(null_count, len)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would help reading code if more comments can be added for the different cases. E.g.,

Suggested change
let len = 64;
let null_count = chunk.count_zeros() as usize;
write_u64_bytes(write_data, write_byte, chunk);
(null_count, len)
// read/write data both aligned
let len = 64;
let null_count = chunk.count_zeros() as usize;
write_u64_bytes(write_data, write_byte, chunk);
(null_count, len)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

} else {
null_count += 1;
let len = 64 - write_shift;
let chunk = chunk << write_shift;
let null_count = len - chunk.count_ones() as usize;
or_write_u64_bytes(write_data, write_byte, chunk);
(null_count, len)
}
});
} else if write_shift == 0 {
let len = 64 - 8; // 56 bits so that write_shift == 0 for the next iteration
kazuyukitanimura marked this conversation as resolved.
Show resolved Hide resolved
let chunk = (chunk >> read_shift) & 0x00FFFFFFFFFFFFFF; // 56 bits mask
let null_count = len - chunk.count_ones() as usize;
write_u64_bytes(write_data, write_byte, chunk);
(null_count, len)
} else {
let len = 64 - std::cmp::max(read_shift, write_shift);
let chunk = (chunk >> read_shift) << write_shift;
let null_count = len - chunk.count_ones() as usize;
or_write_u64_bytes(write_data, write_byte, chunk);
(null_count, len)
}
} else if len == 1 {
let c = (unsafe { *data.as_ptr().add(read_byte) } >> read_shift) & 1;
kazuyukitanimura marked this conversation as resolved.
Show resolved Hide resolved
let ptr = write_data.as_mut_ptr();
unsafe { *ptr.add(write_byte) |= c << write_shift };
((c ^ 1) as usize, 1)
Copy link
Member

@viirya viirya Sep 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add unit test for the different cases? E.g., when len = 1.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

} else {
let len = std::cmp::min(len, 64 - std::cmp::max(read_shift, write_shift));
let bytes = ceil(len + read_shift, 8);
// SAFETY: chunk gets masked, so it is safe
let chunk = unsafe { read_bytes_to_u64(data, read_byte, bytes) };
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add some // SAFETY: explanations to unsafe usages?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

let mask = u64::MAX >> (64 - len);
let chunk = (chunk >> read_shift) & mask;
let chunk = chunk << write_shift;
let null_count = len - chunk.count_ones() as usize;
kazuyukitanimura marked this conversation as resolved.
Show resolved Hide resolved
let bytes = ceil(len + write_shift, 8);
let ptr = unsafe { write_data.as_mut_ptr().add(write_byte) };
for (i, c) in chunk.to_le_bytes().iter().enumerate().take(bytes) {
unsafe { *ptr.add(i) |= c };
}
kazuyukitanimura marked this conversation as resolved.
Show resolved Hide resolved
(null_count, len)
}
}

/// # Safety
/// The caller must ensure all arguments are within the valid range.
/// The caller must be aware `8 - count` bytes in the returned value are uninitialized.
#[inline]
#[cfg(not(miri))]
kazuyukitanimura marked this conversation as resolved.
Show resolved Hide resolved
unsafe fn read_bytes_to_u64(data: &[u8], offset: usize, count: usize) -> u64 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function doesn't limit reading bytes to be up 8 bytes. Do you want to add an assert?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @viirya updated

let mut tmp = std::mem::MaybeUninit::<u64>::uninit();
let src = data.as_ptr().add(offset);
// SAFETY: the caller must not use the uninitialized `8 - count` bytes in the returned value.
kazuyukitanimura marked this conversation as resolved.
Show resolved Hide resolved
unsafe {
std::ptr::copy_nonoverlapping(src, tmp.as_mut_ptr() as *mut u8, count);
tmp.assume_init()
}
}

#[cfg(miri)]
unsafe fn read_bytes_to_u64(data: &[u8], offset: usize, count: usize) -> u64 {
let mut arr = [0u8; 8];
arr[0..count].copy_from_slice(&data[offset..(offset + count)]);
u64::from_le_bytes(arr)
}

null_count as usize
#[inline]
kazuyukitanimura marked this conversation as resolved.
Show resolved Hide resolved
fn write_u64_bytes(data: &mut [u8], offset: usize, chunk: u64) {
// SAFETY: the caller must ensure `data` has `offset..(offset + 8)` range
unsafe {
let ptr = data.as_mut_ptr().add(offset) as *mut u64;
ptr.write_unaligned(chunk);
}
}

#[inline]
kazuyukitanimura marked this conversation as resolved.
Show resolved Hide resolved
fn or_write_u64_bytes(data: &mut [u8], offset: usize, chunk: u64) {
// SAFETY: the caller must ensure `data` has `offset..(offset + 8)` range
unsafe {
let ptr = data.as_mut_ptr().add(offset);
let chunk = chunk | (*ptr) as u64;
(ptr as *mut u64).write_unaligned(chunk);
}
}

#[cfg(test)]
Expand Down
Loading