-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API proposal: Expose FileOptions.NoBuffering flag and support O_DIRECT on Linux #27408
Comments
Per my comments at dotnet/corefx#32314 (comment), I'm hesitant to see this exposed, as it looks like it would be very difficult to use this safely on unix. |
It is unfortunate that there is apparently no way to expose this safely due to the unstable unix semantics. What should happen now if client code passes 0x20000000 on unix? On Windows, this is passed through to the OS although it is undocumented. Should it be silently ignored on unix or should an exception be thrown to warn that the requested mode is not available? I don't know the answer but a conscious decision should be made. |
It only works on Windows and ignored elsewhere. I tried reflection and other stuff, but
FileStream accepts file handle and if you know why you want odirect you
probably know how to invoke fopen natively. That's my lesson at least.
|
What is the proposal now -- to expose for Windows, throw on Unix? |
We should probably expose it and throw for Unix (until we can come up with a safe implementation). I think that is better than having a "hidden" option for Windows. @pjanotti, what are your thoughts? |
My take is that we should be minimizing the differences between OSes. The fact that many exist doesn't mean that we should be adding more without a clear tangible benefit. My vote will be to not do anything until Unix story is sorted out. |
@pjanotti To be clear- this particular one exists already, you just have to know it is there and that we check for the value. Given that both you and @stephentoub are hesitant we should probably park this pending more feedback and/or a solid proposal for how to tackle the Unix end. |
As @buybackoff mentioned, it's currently supported on Windows, but not on Linux (the
We can easily expose it (by adding an enum value) and runtime/src/libraries/System.Private.CoreLib/src/System/IO/Strategies/FileStreamHelpers.Unix.cs Line 49 in 2223bab
The problem is that both on Linux and Windows it would still be hard to use as the undelrying OS mechanisms have non-trivial requirements:
Problems:
|
for .NET 6.0 I would vote to improve the performance of Marshal.Alloc. it seems like since the current perf is so bad, it wouldn't be very hard to improve it and it would benefit everyone who's currently using AllocateHGlobal. we are currently adding the regions support. when that's ready (which will not be this release) it would make adding this much easier in GC. we can allocate regions (which are much smaller units of memory) for objects for each different alignment and collect them not just in gen2 GCs. we will also be implementing another parameter in the AllocateArray API which is the generation (again, much easier with regions). does this sound reasonable? |
I do not think that this is a correct conclusion. On Unix, the performance of On Windows, there is some fixed overhead due to Windows APIs that we historically used to implement it. This overhead is typically negliable if you are using |
you are saying the perf is not blocker on windows so I'm proposing to simply use the Marshal.Alloc as suggested in #33244 for .NET 6. |
Isn't the problem when introducing the NoBuffering that certain conditions need to be met (memory-alignment, sector-wise reading, etc. https://docs.microsoft.com/en-us/windows/win32/fileio/file-buffering). I would love to see this but hiding the internals could lead to the problem that you don't get the full benefit out of it? |
@adamsitnik asked me to comment here, my apologies for saying the obvious stuff and not contributing meaningfully to the technical discussion here. NoBuffering is a very powerful but still obscure option that most people don't know about and thus sadly leave perf on the table or even struggle to get things running without it (like it happened to us 15 years ago), we use it as default for our file IO with rare exceptions when we know that Windows caching would actually be beneficial. I don't know Linux well enough, but it would be very handy to have a degree of support on it also, ideally main FileStream object should deal with all the alignment issues (.NET 6 now supports aligned allocations) for the user not to care (we created our own wrapper for that) and supporting existing code 0x20000000 to mean the same there. To drive adoption of this feature we really need a good blog post in .NET Blog demonstrating its value, especially for network writes in scenarios where amount of data written exceeds available memory on target box and there is some software running there that also got plenty of memory allocated - having it swapped to disk ins one hell of a killer even on NVMes and even disabling swap can still lead to exhaustion of resources. |
@adamsitnik following up on the discussion at #86836 here are some ideas towards supporting direct I/O on Windows and Linux. So the first order of business is to create an abstraction over using System.Runtime.InteropServices;
// represents a block of memory that is aligned to a specified byte boundary.
public unsafe class AlignedMemoryBlock: IDisposable
{
// size of the block in bytes.
public int Size { get; }
// alignment of the block in bytes.
public int Alignment { get; }
// pointer to the block's bytes in memory.
public IntPtr Pointer { get; }
// cast the block's bytes to a span of the given type.
public Span<T> AsSpan<T>() where T : unmanaged => new(Pointer.ToPointer(), Size / sizeof(T));
// create a new aligned memory block with the given size and alignment.
public AlignedMemoryBlock(int ArgSize, int ArgAlignment) {
// store the size and alignment (handy for unit testing).
(Size, Alignment) = (ArgSize, ArgAlignment);
// allocate the block.
Pointer = new IntPtr(NativeMemory.AlignedAlloc((nuint)ArgSize, (nuint)ArgAlignment));
}
// free the block.
public void Dispose() => NativeMemory.AlignedFree(Pointer.ToPointer());
}
// represents a block of memory that is aligned to the sector size of the platform.
public class SectorAlignedMemoryBlock: AlignedMemoryBlock
{
// get the sector size of the platform (here we just assume it's 512 bytes).
public static int GetPlatformSectorSize() => 512;
// create a new sector-aligned memory block with the given size.
public SectorAlignedMemoryBlock(int ArgSize) : base(ArgSize, GetPlatformSectorSize()) { }
} Note that the above Next up we need a way to open a file with the right attributes for direct I/O using
The // handle to files opened for direct block I/O operations.
public sealed class SafeBlockFileHandle: IDisposable
{
/// <inheritdoc />
public void Dispose() { }
}
public partial class File
{
// open a file for direct block I/O operations. It should have most of the same arguments as the current
// File.OpenHandle() function but enforce the direct I/O flags (e.g. O_DIRECT | O_SYNC on Linux).
public static SafeBlockFileHandle OpenBlockHandle(string ArgPath) => new SafeBlockFileHandle();
} Finally we need functions to read and write blocks directly to/from disk. I think we can just add these as overloads to the public partial class RandomAccess
{
// reads a block of bytes from the given file handle at the given offset into the given buffer.
public static ValueTask<int> ReadAsync(SafeBlockFileHandle ArgHandle, SectorAlignedMemoryBlock ArgBuffer, long ArgOffset) {
throw new NotImplementedException();
}
// writes a block of bytes to the given file handle at the given offset from the given buffer.
public static ValueTask WriteAsync(SafeBlockFileHandle ArgHandle, SectorAlignedMemoryBlock ArgBuffer, long ArgOffset) {
throw new NotImplementedException();
}
} With all the pieces in place, usage will look something like this: // open a file for direct block I/O operations.
using var file = File.OpenBlockHandle(@"c:\my-binary-file.dat");
// allocate a 4KB block.
using var block = new SectorAlignedMemoryBlock(4096);
// read data from the file into the block.
await RandomAccess.ReadAsync(file, block, ArgOffset: 0); Remaining issues/questions to address:
Last but not least: I salute the .NET team for the amazing work you all do. This is just one small set of APIs and it's taken me several hours to type all this up, I can't imagine the insane amount of hours required to churn out all those new APIs we get with each version of .NET! |
This could be avoided by extending So to open the handle you would need to pass
We already have a helper that derives from runtime/src/libraries/System.IO.FileSystem/tests/RandomAccess/SectorAlignedMemory.Windows.cs Line 12 in 7bcd509
But it's internal. But overall I agree that if we want to expose
Yes, IIRC in needs to be a multiple of volume sector size: https://learn.microsoft.com/en-us/windows/win32/fileio/file-buffering#alignment-and-file-access-requirements
You could always rent a bigger array, pin it so the address does not change when GC happens in the meantime, get the first aligned address of it and use only a slice of it by for example creating a @ericmutta thank you for your feedback! |
Thanks for following up! Some questions here: if I pass My thinking behind suggesting I appreciate all the work you are doing on this and look forward to seeing how it evolves to the final result 🙏 |
@ericmutta the OS performs the validation for every call anyway. But it reports quite a generic error (invalid parameter) and then our job would be to check the alignment & offset and provide more detailed error message. |
Proposed API
This flag is hidden but supported on Windows by value already (see https://github.com/dotnet/coreclr/blob/75e62c545ac5c7195bf846b47e28c4f27736d64c/src/System.Private.CoreLib/shared/System/IO/FileStream.cs#L204).
This proposal is to make it part of the public surface and implement it on Unix by mapping it to
O_DIRECT
.Continuation of https://github.com/dotnet/coreclr/issues/19229 and dotnet/coreclr#19232.
Expose
FileOptions.NoBuffering
as a part of API. It is already unofficially supported on Windows. Just uncomment this line and add O_DIRECT support for Linux.PR for the changes.
The text was updated successfully, but these errors were encountered: