Skip to content

Commit

Permalink
Merge pull request #135 from Alexhuszagh/writesafe
Browse files Browse the repository at this point in the history
This introduces numerous different layers of security enhancements:

1. Removal of most unsafe code (according to count-unsafe, the code went from 160 unsafe functions and 3088 unsafe exprs to 8 unsafe functions and 1248 unsafe exprs, However, all the remaining unsafe code has much clearly documented safety guarantees and is isolated into safe abstractions.
2. Clear documentation of the locations where unsafe code is used and at the crate-level documentation so it's clearly visible.

A security policy has also been added, with stricter requirements for soundness with PRs.

Closes #100.
  • Loading branch information
Alexhuszagh authored Sep 14, 2024
2 parents 0970a59 + 7df1e74 commit 7c0100d
Show file tree
Hide file tree
Showing 42 changed files with 570 additions and 1,007 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ Cargo.lock
/build
*.pyc
TODO.md
*.diff

# Perftools
perf.data
Expand Down
2 changes: 2 additions & 0 deletions CHANGELOG
Original file line number Diff line number Diff line change
Expand Up @@ -32,11 +32,13 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- Improved performance of integer and float parsing, particularly with small integers.
- Removed almost all unsafety in `lexical-util` and clearly documented the preconditions to use safely.
- Removed almost all unsafety in `lexical-write-integer` and clearly documented the preconditions to use safely.
- Writing special numbers even with invalid float formats is now always memory safe.

### Removed

- Support for mips (MIPS), mipsel (MIPS LE), mips64 (MIPS64 BE), and mips64el (MIPS64 LE) on Linux.
- All `_unchecked` API methods, since the performance benefits are dubious and it makes safety invariant checking much harder.
- The `safe` and `nightly` features, since ASM is now supported by the MSRV on stable and opt-in for memory-safe indexing is no longer relevant.

## [0.8.5] 2022-06-06

Expand Down
12 changes: 3 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -139,8 +139,6 @@ Lexical is highly customizable, and contains numerous other optional features:
<blockquote>With format enabled, the number format is dictated through bitflags and masks packed into a <code>u128</code>. These dictate the valid syntax of parsed and written numbers, including enabling digit separators, requiring integer or fraction digits, and toggling case-sensitive exponent characters.</blockquote>
- **compact**: &ensp; Optimize for binary size at the expense of performance.
<blockquote>This minimizes the use of pre-computed tables, producing significantly smaller binaries.</blockquote>
- **safe**: &ensp; Requires all array indexing to be bounds-checked.
<blockquote>This has limited impact for number parsers, since they use safe indexing except where indexing without bounds checking and can general be shown to be sound. The number writers frequently use unsafe indexing, since we can easily over-estimate the number of digits in the output due to the fixed-length input. We use comprehensive fuzzing, UB detection via miri, and proving local safe invariants to ensure correctness without impacting performance.</blockquote>
- **f16**: &ensp; Add support for numeric conversions to-and-from 16-bit floats.
<blockquote>Adds <code>f16</code>, a half-precision IEEE-754 floating-point type, and <code>bf16</code>, the Brain Float 16 type, and numeric conversions to-and-from these floats. Note that since these are storage formats, and therefore do not have native arithmetic operations, all conversions are done using an intermediate <code>f32</code>.</blockquote>

Expand Down Expand Up @@ -331,16 +329,12 @@ lexical-core should also work on a wide variety of other architectures and ISAs.

The currently supported versions are:
- v1.0.x
- v0.8.x
- v0.7.x (Maintenance)
- v0.6.x (Maintenance)

Due to security considerations, all other versions have been yanked.

**Rustc Compatibility**

- v0.8.x supports 1.63+, including stable, beta, and nightly.
- v0.8.x supports 1.51+, including stable, beta, and nightly.
- v0.7.x supports 1.37+, including stable, beta, and nightly.
- v0.6.x supports Rustc 1.24+, including stable, beta, and nightly.
- v1.0.x supports 1.63+, including stable, beta, and nightly.

Please report any errors compiling a supported lexical-core version on a compatible Rustc version.

Expand Down
1 change: 0 additions & 1 deletion lexical-benchmark/parse-float/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,6 @@ power-of-two = ["lexical-util/power-of-two", "lexical-parse-float/power-of-two"]
format = ["lexical-util/format", "lexical-parse-float/format"]
compact = ["lexical-util/compact", "lexical-parse-float/compact"]
asm = []
nightly = ["lexical-parse-float/nightly"]
integers = ["lexical-util/integers"]
floats = ["lexical-util/floats"]
json = []
Expand Down
12 changes: 0 additions & 12 deletions lexical-benchmark/parse-float/black_box.rs
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@
// Optimized black box using the nicer assembly syntax.
#[cfg(feature = "asm")]
pub fn black_box(mut dummy: f64) -> f64 {
// THe `asm!` macro was stabilized in 1.59.0.
use core::arch::asm;
Expand All @@ -12,14 +11,3 @@ pub fn black_box(mut dummy: f64) -> f64 {
dummy
}
}

// Optimized black box using the nicer assembly syntax.
#[cfg(not(feature = "asm"))]
#[allow(forgetting_copy_types)]
pub fn black_box(dummy: f64) -> f64 {
unsafe {
let x = core::ptr::read_volatile(&dummy);
core::mem::forget(dummy);
x
}
}
5 changes: 0 additions & 5 deletions lexical-benchmark/parse-float/denormal30.rs
Original file line number Diff line number Diff line change
@@ -1,8 +1,3 @@
// Inline ASM was stabilized in 1.59.0.
// FIXME: Remove when the MSRV for Rustc >= 1.59.0.
#![allow(stable_features)]
#![cfg_attr(feature = "nightly", feature(asm))]

mod black_box;
use black_box::black_box;
use lexical_parse_float::FromLexical;
Expand Down
5 changes: 0 additions & 5 deletions lexical-benchmark/parse-float/denormal6400.rs
Original file line number Diff line number Diff line change
@@ -1,8 +1,3 @@
// Inline ASM was stabilized in 1.59.0.
// FIXME: Remove when the MSRV for Rustc >= 1.59.0.
#![allow(stable_features)]
#![cfg_attr(feature = "nightly", feature(asm))]

mod black_box;
use black_box::black_box;
use lexical_parse_float::FromLexical;
Expand Down
5 changes: 5 additions & 0 deletions lexical-benchmark/write-float/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -44,3 +44,8 @@ harness = false
name = "random"
path = "random.rs"
harness = false

[[bench]]
name = "special"
path = "special.rs"
harness = false
48 changes: 48 additions & 0 deletions lexical-benchmark/write-float/special.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
#[macro_use]
mod input;

use core::mem;
use core::time::Duration;

use criterion::{black_box, criterion_group, criterion_main, Criterion};
use lexical_write_float::ToLexical;

// Default random data size.
const COUNT: usize = 1000;

// BENCHES

macro_rules! gen_vec {
($exp_mask:expr, $i:ident, $f:ident) => {{
let mut vec: Vec<$f> = Vec::with_capacity(COUNT);
for _ in 0..COUNT {
let value = fastrand::$i($exp_mask..);
// NOTE: We want mem::transmute, not from_bits because we
// don't want the special handling of from_bits
#[allow(clippy::transmute_int_to_float)]
vec.push(unsafe { mem::transmute::<$i, $f>(value) });
}
vec
}};
}

macro_rules! bench {
($fn:ident, $name:literal) => {
fn $fn(criterion: &mut Criterion) {
let mut group = criterion.benchmark_group($name);
group.measurement_time(Duration::from_secs(5));
let exp32_mask: u32 = 0x7F800000;
let exp64_mask: u64 = 0x7FF0000000000000;

let f32_data = gen_vec!(exp32_mask, u32, f32);
let f64_data = gen_vec!(exp64_mask, u64, f64);

write_float_generator!(group, "f32", f32_data.iter(), format32);
write_float_generator!(group, "f64", f64_data.iter(), format64);
}
};
}

bench!(random_special, "random:special");
criterion_group!(special_benches, random_special);
criterion_main!(special_benches);
17 changes: 0 additions & 17 deletions lexical-core/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -100,23 +100,6 @@ compact = [
"lexical-parse-integer?/compact",
"lexical-parse-float?/compact"
]
# Ensure only safe indexing is used.
# This is only relevant for the number writers, since the parsers
# are memory safe by default (and only use memory unsafety when
# is the trivial to prove correct).
safe = [
"lexical-write-integer?/safe",
"lexical-write-float?/safe",
"lexical-parse-integer?/safe",
"lexical-parse-float?/safe"
]
# Add support for nightly-only features.
nightly = [
"lexical-write-integer?/nightly",
"lexical-write-float?/nightly",
"lexical-parse-integer?/nightly",
"lexical-parse-float?/nightly"
]
# Enable support for 16-bit floats.
f16 = [
"lexical-util/f16",
Expand Down
5 changes: 0 additions & 5 deletions lexical-parse-float/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -68,11 +68,6 @@ compact = [
"lexical-util/compact",
"lexical-parse-integer/compact"
]
# Ensure only safe indexing is used. This is effectively a no-op, since all
# examples of potential memory unsafety are trivial to prove safe.
safe = ["lexical-parse-integer/safe"]
# Add support for nightly-only features.
nightly = ["lexical-parse-integer/nightly"]
# Enable support for 16-bit floats.
f16 = ["lexical-util/f16"]

Expand Down
8 changes: 0 additions & 8 deletions lexical-parse-float/src/bigint.rs
Original file line number Diff line number Diff line change
Expand Up @@ -18,14 +18,6 @@ use crate::table::get_large_int_power;
/// # Safety
///
/// Safe if `index < array.len()`.
#[cfg(feature = "safe")]
macro_rules! index_unchecked {
($x:ident[$i:expr]) => {
$x[$i]
};
}

#[cfg(not(feature = "safe"))]
macro_rules! index_unchecked {
($x:ident[$i:expr]) => {
// SAFETY: safe if `index < array.len()`.
Expand Down
1 change: 0 additions & 1 deletion lexical-parse-float/src/fpu.rs
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,6 @@
//!
//! It is therefore also subject to a Apache2.0/MIT license.
#![cfg(feature = "nightly")]
#![doc(hidden)]

pub use fpu_precision::set_precision;
Expand Down
2 changes: 0 additions & 2 deletions lexical-parse-float/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -30,8 +30,6 @@
//! * `radix` - Add support for strings of any radix.
//! * `format` - Add support for parsing custom integer formats.
//! * `compact` - Reduce code size at the cost of performance.
//! * `safe` - Ensure only memory-safe indexing is used.
//! * `nightly` - Enable assembly instructions to control FPU rounding modes.
//!
//! # Note
//!
Expand Down
13 changes: 0 additions & 13 deletions lexical-parse-float/src/libm.rs
Original file line number Diff line number Diff line change
Expand Up @@ -28,19 +28,6 @@
/// # Safety
///
/// Safe if `index < array.len()`.
#[cfg(feature = "safe")]
macro_rules! i {
($x:ident, $i:expr) => {
$x[$i]
};
}

/// Index an array without bounds checking.
///
/// # Safety
///
/// Safe if `index < array.len()`.
#[cfg(not(feature = "safe"))]
macro_rules! i {
($x:ident, $i:expr) => {
unsafe { *$x.get_unchecked($i) }
Expand Down
3 changes: 0 additions & 3 deletions lexical-parse-float/src/number.rs
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,6 @@
use lexical_util::format::NumberFormat;

use crate::float::RawFloat;
#[cfg(feature = "nightly")]
use crate::fpu::set_precision;

/// Representation of a number as the significant digits and exponent.
Expand Down Expand Up @@ -65,7 +64,6 @@ impl<'a> Number<'a> {
// function takes care of setting the precision on architectures which
// require setting it by changing the global state (like the control word of the
// x87 FPU).
#[cfg(feature = "nightly")]
let _cw = set_precision::<F>();

if self.is_fast_path::<F, FORMAT>() {
Expand Down Expand Up @@ -105,7 +103,6 @@ impl<'a> Number<'a> {
let format = NumberFormat::<FORMAT> {};
debug_assert!(format.mantissa_radix() == format.exponent_base());

#[cfg(feature = "nightly")]
let _cw = set_precision::<F>();

let radix = format.radix();
Expand Down
5 changes: 0 additions & 5 deletions lexical-parse-integer/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -46,11 +46,6 @@ radix = ["lexical-util/radix", "power-of-two"]
format = ["lexical-util/format"]
# Reduce code size at the cost of performance.
compact = ["lexical-util/compact"]
# Ensure only safe indexing is used. This is a no-op, since all
# examples of potential memory unsafety are trivial to prove safe.
safe = []
# Add support for nightly-only features.
nightly = []

# Internal only features.
# Enable the lint checks.
Expand Down
2 changes: 1 addition & 1 deletion lexical-util/src/algorithm.rs
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ use crate::num::Integer;

/// Copy bytes from source to destination.
///
/// This is only used in our compactt and radix integer formatted, so
/// This is only used in our compact and radix integer formatted, so
/// performance isn't the highest consideration here.
#[inline(always)]
#[cfg(feature = "write")]
Expand Down
9 changes: 9 additions & 0 deletions lexical-util/src/num.rs
Original file line number Diff line number Diff line change
Expand Up @@ -713,6 +713,15 @@ pub trait Float: Number + ops::Neg<Output = Self> {
!self.is_odd()
}

/// Returns true if the float needs a negative sign when serializing it.
///
/// This is true if it's `-0.0` or it's below 0 and not NaN. But inf values
/// need the sign.
#[inline(always)]
fn needs_negative_sign(self) -> bool {
self.is_sign_negative() && !self.is_nan()
}

/// Get exponent component from the float.
#[inline(always)]
fn exponent(self) -> i32 {
Expand Down
4 changes: 2 additions & 2 deletions lexical-util/src/skip.rs
Original file line number Diff line number Diff line change
Expand Up @@ -789,12 +789,12 @@ macro_rules! skip_iterator_bytesiter_base {

#[inline(always)]
fn read_u32(&self) -> Option<u32> {
unsafe { self.byte.read_u32() }
self.byte.read_u32()
}

#[inline(always)]
fn read_u64(&self) -> Option<u64> {
unsafe { self.byte.read_u64() }
self.byte.read_u64()
}

#[inline(always)]
Expand Down
6 changes: 0 additions & 6 deletions lexical-write-float/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -67,12 +67,6 @@ compact = [
"lexical-util/compact",
"lexical-write-integer/compact"
]
# Ensure only safe indexing is used.
# This is not enabled by default for writers, due to the performance
# costs, and since input can be easily validated to avoid buffer overwrites.
safe = ["lexical-write-integer/safe"]
# Add support for nightly-only features.
nightly = ["lexical-write-integer/nightly"]
# Enable support for 16-bit floats.
f16 = ["lexical-util/f16"]

Expand Down
Loading

0 comments on commit 7c0100d

Please sign in to comment.