Skip to content
This repository has been archived by the owner on Oct 1, 2023. It is now read-only.

Add Filesystem\unlink(), and similar operations that aren't related to handles #29

Open
fredemmott opened this issue Dec 14, 2018 · 17 comments

Comments

@fredemmott
Copy link
Contributor

No description provided.

@fredemmott
Copy link
Contributor Author

fredemmott commented Oct 28, 2019

Current thinking:

  • File\ will be reserved for actual file handle/fd operations
  • Filesystem\ is a potential place for things that operate on the filesystem but do not involve a specific fd, or require that they are operating on a plain file - e.g. Filesystem\rm_rf (not actually calling a function that though...)
  • It is probably necessary for some behaviors to expose the posix behavior thinly, e.g. OS\readdir returning dirent as a shape
  • however, if we do that, where do convenience functions go? OS\readdir (POSIX) and OS\readdir_recursive? (convenience function) OS\readdir and Filesystem\readdir_recursive?
  • should Filesystem\ functions return types return the OS\ thin-wrappers-over-posix or an abstraction?
  • say they return an abstraction, with Filesystem\readdir returning a FileInfo object shared with Filesystem\stat - what does FileInfo::isExecutable() do if it was created by a readdir, as the dirent does not include that information?
    • silent stat() syscall?
    • throw an exception?
    • have separate DirFileInfo vs FileInfo with a common base interface? Leaky abstraction
  • probably best to just implement OS\ as a thin posix wrapper and leave Filesystem\ for third-party projects; review what third-party projects build and reconsider later.

@fredemmott fredemmott changed the title Add Filesystem\unlink() Add Filesystem\unlink(), and similar operations that aren't related to handles Oct 28, 2019
@azjezz
Copy link
Contributor

azjezz commented Oct 28, 2019

File\ will be reserved for actual file handle/fd operations

agree

Filesystem\ is a potential place for things that operate on the filesystem but do not involve a specific fd, or require that they are operating on a plain file - e.g. Filesystem\rm_rf (not actually calling a function that though...)

agree

It is probably necessary for some behaviors to expose the posix behavior thinly, e.g. OS\readdir returning dirent as a shape

how would the dirent shape look like ? match the linux c structure ? i.e :

struct dirent {
    ino_t          d_ino;
    off_t          d_off;
    unsigned short d_reclen;
    unsigned char  d_type;
    char           d_name[256];
};

however, if we do that, where do convenience functions go? OS\readdir (POSIX) and OS\readdir_recursive? (convenience function) OS\readdir and Filesystem\readdir_recursive?

OS\

should Filesystem\ functions return types return the OS\ thin-wrappers-over-posix or an abstraction?

an abstraction, let's keep OS\ for low level things and the you don't need when developing an average website ( which is what hack is mostly used for, even if that's not the goal of hack )

see : https://github.com/nuxed/filesystem

say they return an abstraction, with Filesystem\readdir returning a FileInfo object shared with Filesystem\stat - what does FileInfo::isExecutable() do if it was created by a readdir, as the dirent does not include that information?

silent stat() syscall?
throw an exception?
have separate DirFileInfo vs FileInfo with a common base interface? Leaky abstraction

silent stat() syscall. no need to leak implementation details. if it is possible to determine if the node is executable, we do so.

probably best to just implement OS\ as a thin posix wrapper and leave Filesystem\ for third-party projects; review what third-party projects build and reconsider later.

agree, IMHO i don't think Filesystem\ is urgent, OS\ is more important at the moment ( i would also like to stop using php stdlib in https://github.com/nuxed/filesystem ) .

@jjergus
Copy link
Contributor

jjergus commented Oct 28, 2019

I don't like the distinction between File and Filesystem, why not File\unlink? "Everything in File must work with File\Handles" feels like an unnecessary arbitrary restriction to me.

silent stat() syscall

I don't know -- in general I don't think we should hide potentially expensive operations that can fail inside what looks like a getter method, but I don't know if there's a better option here. It might be OK depending on the exact API (e.g. if it clearly doesn't look like a getter method).

If possible, I would maybe try to do the opposite: Have a function that clearly looks like an expensive operation that can fail, but sometimes it returns a value without actually doing the operation if the value is somehow already available.


I am starting to be a bit confused about the namespaces, OS vs IO vs File/TCP/etc. I'm not sure if it makes sense to simplify that or if that's necessary, but some random ideas:

Would it be crazy to merge OS and IO? We already settled on (I think) doing that for exceptions. Are there any OS operations that could not be reasonably described as IO? If not, then IO might be the preferable name, it seems more common than OS.

As a user of the language, I would really rather not end up in a situation where there are 4 different namespaces in which a function like "unlink" could reasonably exist (OS, IO, Filesystem, File).

@fredemmott
Copy link
Contributor Author

how would the dirent shape look like ? match the linux c structure ?

Well-typed common subset of (macos & linux)

however, if we do that, where do convenience functions go? OS\readdir (POSIX) and OS\readdir_recursive? (convenience function) OS\readdir and Filesystem\readdir_recursive?

That'd probably be the best for user expectations, but it does mean that we'd have non-OS-defined functions in OS; not sure that's right

an abstraction, let's keep OS\ for low level things and the you don't need when developing an average website ( which is what hack is mostly used for, even if that's not the goal of hack )

While I agree with the conclusion, and while current usage must be covered adequately, we are very definitely targeting the long-term goals with this design, even if it ends up having some negative effects for current uses.

silent stat() syscall. no need to leak implementation details. if it is possible to determine if the node is executable, we do so.

Very strongly against this - it is necessary to leak this particular one as "I accidentally called stat a million times" is a reasonably common performance problem. It should be explicit.


I don't like the distinction between File and Filesystem, why not File\unlink? "Everything in File must work with File\Handles" feels like an unnecessary arbitrary restriction to me.

File\chmod($directory) feels strange to me; I'm also worried that we might end up needing to put File back into classnames.


Would it be crazy to merge OS and IO? We already settled on (I think) doing that for exceptions.

Not crazy - though I feel that in a way OS\ is more well-defined: thin wrappers. I'd expect IO functions in OS\ to be a thinner layer over FDs, and not to provide functions like writeAsync(), requiring exlpicit C/async instead.

Perhaps that idea of OS\ might be better under _Private or something.

Also, if OS\ will contain non-IO stuff, it would be good to keep the mess of IO\ interfaces out of there for readability.

Are there any OS operations that could not be reasonably described as IO? If not, then IO might be the preferable name, it seems more common than OS.

Python's OS module seems pretty much the same thing (low-level wrappers of POSIX stuff); some examples:

  • getpid, getlogin, getuid, nice
  • strerror (errno -> string)
  • umask

There's some debatable cases like getcwd, exec, fork, kill

As a user of the language, I would really rather not end up in a situation where there are 4 different namespaces in which a function like "unlink" could reasonably exist (OS, IO, Filesystem, File).

Definitely agree. Unless we merge IO\ and OS, I think it'd be good to strongly discourage using the stuff in OS\ except where necessary - but recognize it is necessary for completeness.


Quoting myself here:

but recognize it is necessary for completeness.

This isn't necessarily true if we ship C FFI - but the HSL won't be able to use that, and we should expect other deployments to disable it in libraries out of security concerns

@jjergus
Copy link
Contributor

jjergus commented Oct 28, 2019

Perhaps that idea of OS\ might be better under _Private or something.

That sounds reasonable to me. What would be the disadvantages? Are there legitimate use-cases for people using low-level functions for anything where we provide a higher-level abstraction? (e.g. performance?)

File\chmod($directory) feels strange to me

I think I'd rather have a few slightly strange-feeling functions like this, than a library where people have to remember which functions are in File vs Filesystem based on how strange each one felt to the library authors :)

@fredemmott
Copy link
Contributor Author

fredemmott commented Oct 28, 2019

That sounds reasonable to me. What would be the disadvantages? Are there legitimate use-cases for people using low-level functions for anything where we provide a higher-level abstraction? (e.g. performance?)

Depends on if we tradeoff usability for completeness; for example, PHP and OCaml's readdir and scandir functions just return a list of filenames instead of dirent - if you want filenames and types, you need to do a function-call-per-file to find out; I've got a diff up where it looks like I'm making find in hh_server ~2x faster by switching to the posix call.

Also, if we want to be able to cover systems programming, we probably need to expose the POSIX-like behaviors (and there are non-technical restrictions on what we can name that namespace which is why errno is in OS); they're very specifically defined in annoying ways, but sometimes you need that (e.g. I needed that when implementing async socket connections); I think if we aim to be sufficiently complete in a higher-level abstraction, we're going to fail, and have a lot of hard-to-debug problems.

I think I'd rather have a few slightly strange-feeling functions like this, than a library where people have to remember which functions are in File vs Filesystem based on how strange each one felt to the library authors :)

Yeah... I need to figure out if there's an underlying reason that can be consistently applied, or if I'm just being weird about that :)

@jjergus
Copy link
Contributor

jjergus commented Oct 28, 2019

Yeah I guess if one of the namespaces (OS?) is very explicitly for low-level things that almost no one needs, and anything that is not strictly equivalent to a low-level call goes to a different namespace, that seems pretty reasonable. "If you have to ask than this is not the namespace you want" is an easy enough rule for people to remember :)

@fredemmott
Copy link
Contributor Author

fredemmott commented Oct 28, 2019

Does "almost no one ever handles IO failures" count as "almost no one needs OS\Errno"? 😅

@fredemmott
Copy link
Contributor Author

fredemmott commented Oct 28, 2019

So, are we leaning towards:

  • OS\readdir($path) - exactly that name - returning vec<dirent> - and discouraging it's direct use
  • File\list_directory - name open to bikeshedding - taking an options shape (filter func, skip . and .., recursive/non-recursive) - and with return value also open to bikeshedding?

Okay, actually a little bit of bikeshedding about readdir: I could go a step thinner:

using $dir = OS\opendir($path) {
  while ($dirent = OS\readdir($dir)) {
    // ...
  }
}

This would be very slightly better if you have a huge directory and want to exit early.
It would be worse otherwise - many more C<->Hack calls.

IMO we should generally consider the 'returns each in turn then returns nullptr' C pattern to be 'the block as a whole returns a vec', and if there is a case where it turns out to matter, add both forms.

@fredemmott
Copy link
Contributor Author

fredemmott commented Oct 28, 2019

Actually that's another case of my fuzzy feelings rather than consistent principle. Let's go for as thin as practical, and add the slightly-less-thin versions if needed after profiling.

so, OS\opendir($path), OS\readdir($dir), add OS\readdir_vec($path) or something later if it turns out to actually be a problem.

@jjergus
Copy link
Contributor

jjergus commented Oct 28, 2019

Sounds reasonable. Also eventually the OS namespace should probably be built-in, not part of HSL, right? We don't want wrappers like

namespace OS {
  function opendir($path) {
    return \opendir($path);
  }
}

in the long term (might be needed temporarily).

@azjezz
Copy link
Contributor

azjezz commented Oct 28, 2019

Sounds reasonable. Also eventually the OS namespace should probably be built-in, not part of HSL, right?

IMO, IO\, File\, Network\, TCP\, OS\ ... etc, all should be built-in instead of built in top of php stdlib

@fredemmott
Copy link
Contributor Author

built-in

IMO ideally the whole HSL should be built-in :)

  • this is especially true for IO: either HSL IO is becoming built-in when it leaves experimental, or I'm adding a way to make destructors no-op for resources opened by the HSL - I do not want HSL IO launched with refcounting as an observable behavior.
  • even more true for OS; using the readdir/opendir examples, they can not be implemented as wrappers around the PHP builtins without a bunch of hidden `stat()s. I'll do it for experimental\ to get the API 'right', but absolutely will not be committing that to the main HSL.

I suspect what I'll end up doing to 'release' HSL IO is to add a new _Private\Native\FileDescriptor type, and add HH\Lib\OS\fwrite(FileDescriptor, ...) etc, with none of the PHP behaviors. Won't directly use ints, as that would allow requests to access each other's FDs.

@jjergus
Copy link
Contributor

jjergus commented Oct 28, 2019

IMO ideally the whole HSL should be built-in :)

I meant "built-in" as in written in C++ with Hack code only in .hhi files. I don't think we need that for the whole HSL, but definitely the whole HSL should be "built-in" as in shipped with HHVM.

@fredemmott
Copy link
Contributor Author

I dream of a world where the PHP stdlib needs a runtime option to be available at all - and the HSL doesn't depend on it :)

@azjezz
Copy link
Contributor

azjezz commented Oct 28, 2019

I dream of a world where the PHP stdlib is not available at all - and the HSL doesn't depend on it :)*

@azjezz
Copy link
Contributor

azjezz commented Jun 26, 2021

i have implemented this in PSL, it is built in top of PHP stdlib, but could provide a reference for naming.. etc. https://github.com/azjezz/psl/tree/1.8.x/src/Psl/Filesystem

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants