Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Request] add support for project id (i.e. lsattr -p) #553

Open
markmoe19 opened this issue Sep 8, 2023 · 20 comments · May be fixed by #594
Open

[Request] add support for project id (i.e. lsattr -p) #553

markmoe19 opened this issue Sep 8, 2023 · 20 comments · May be fixed by #594

Comments

@markmoe19
Copy link

dwalk supports group and gid. Could it also support project and project id? See "/usr/bin/lsattr -p" for project id linux attribute on a file. We make use of gid for access and project id is often the same but sometimes different and used more for quota control. Thank you.

@adammoody
Copy link
Member

adammoody commented Oct 3, 2023

I haven't used project ids much myself.

Do you know if there is a C API to access that info?

Are those store as extended attributes maybe, and if so, do you know the attribute name?

As with the dwalk formatting change, we could hack something up to get you a quick solution here. If there is a C API, we could likely just call that at the point where the text lines are formatted. If there is not a C API, we could try to invoke and parse the lsattr command through a popen() call instead, though I don't know how well popen() calls are supported in MPI environments (forking procs sometimes causes problems).

@adammoody
Copy link
Member

adammoody commented Oct 3, 2023

For reference, I see the lsattr man page points to sourceforge

lsattr is part of the e2fsprogs package and is available from http://e2fsprogs.sourceforge.net.

which then points to this git repo:

https://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git/tree/lib

Looks like this is the code that reads the file project id:

https://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git/tree/lib/e2p/fgetproject.c#n42

the fsxattr struct and FS_IOC_FSGETXATTR constant are defined here:

https://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git/tree/lib/e2p/project.h

@adammoody
Copy link
Member

Another reference to project id info on Lustre: https://jira.whamcloud.com/browse/LU-4017

@ofaaland , do you know offhand if there is an API to read a file project id (lsattr -p) in Lustre?

@adilger
Copy link
Contributor

adilger commented Oct 4, 2023

@adammoody there is an ioctl(FS_IOC_FSGETXATTR) interface to get the projid of a file/directory, and similarly ioctl(FS_IOC_FSSETXATTR) to set the projid on files/dirs and FS_FL_PROJINHERIT flag on directories.

This same ioctl is implemented for Lustre, XFS (original), and other filesystems that support project IDs (possibly ZFS also). See, for example lustre/utils/lfs_project.c::project_get_fsxattr() in the Lustre tree. The struct and constants are already declared in the Linux kernel <Linux/fs.h> header.

I definitely would not recommend to do an lsattr system call to get this information, as that is a ton of overhead, and makes the code dependent on the installation of e2fsprogs and the exact syntax and output of that tool. While it is unlikely to change, that adds needless complexity.

@adammoody
Copy link
Member

adammoody commented Oct 4, 2023

Thank you, @adilger ! That's perfect.

@markmoe19 , for a quick fix, you could hack this in to the text formatting function that you modified in #555.

Here is some example code I whipped up based on the tips above that I think will work:

#include <stdio.h>
#include <stdlib.h>
#include <errno.h>
#include <string.h>
#include <stdint.h>

#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>

#include <sys/ioctl.h>
#include <linux/fs.h>

static int get_projid(const char* path, uint32_t* id)
{
  int ret = 0;
  int fd = open(path, O_RDONLY);
  if (fd >= 0) {
    struct fsxattr attr;
    int rc = ioctl(fd, FS_IOC_FSGETXATTR, &attr);
    if (rc == 0) {
      *id = (uint32_t) attr.fsx_projid;
    } else {
      printf("ioctl failed %s %s (errno=%d)\n", path, strerror(errno), errno);
      ret = -1;
    }
    close(fd);
  } else {
    printf("open failed %s %s (errno=%d)\n", path, strerror(errno), errno);
    ret = -1;
  }
  return ret;
}

int main(int argc, char* argv[])
{
  char path[] = "/p/lustre/user123/foo.dat";

  uint32_t id;
  int rc = get_projid(path, &id);
  if (rc == 0) {
    printf("%s %llu\n", path, (unsigned long long)id);
  }

  return 0;
}

You can add the definition for get_projid as above to src/common/mfu_flist_io.c and then call get_projid(file, &id) for each file within print_file_text().

That will execute twice for each file, once to fetch the value when computing the number of characters needed, and once more to fetch the value for actual writing to the file. That obviously could be improved by caching the value, but at least it would work to just do it twice.

Since this is an uint32_t type, I suppose it'd be best to print with "%"PRIu32.

@adilger
Copy link
Contributor

adilger commented Oct 5, 2023

Opening and closing the file twice is not free, even in newer Lustre releases that have automatic open cache on the client. Ideally, you would open an fd on the path once at the starts, and pass the fd to the various functions. That also avoids TOCTOU race when doing pathname resolution multiple times.

@adammoody
Copy link
Member

Reducing the open calls is a good point, especially if you are dealing with many files, @markmoe19 . If that's a problem, the most immediate improvement would be to pull the project id lookup up into mfu_flist_write_text() for all files and then pass those project id values into print_file_text(). I'd go ahead and try the double-open hack first at least to check that it does what you need. Then we can optimize that if needed.

@adammoody
Copy link
Member

adammoody commented Oct 6, 2023

In case you want to jump straight to the single-open version, here is a patch:

diff --git a/src/common/mfu_flist_io.c b/src/common/mfu_flist_io.c
index 0b0a2e5..7acb60d 100644
--- a/src/common/mfu_flist_io.c
+++ b/src/common/mfu_flist_io.c
@@ -22,6 +22,10 @@
 #include <errno.h>
 #include <string.h>
 
+/* to read file project id via ioctl */
+#include <sys/ioctl.h>
+#include <linux/fs.h>
+
 #include "dtcmp.h"
 #include "mfu.h"
 #include "mfu_flist_internal.h"
@@ -1591,9 +1595,29 @@ void mfu_flist_write_cache(
     return;
 }
 
+static int get_projid(const char* path, uint32_t* id)
+{
+    int ret = -1;
+    int fd = open(path, O_RDONLY);
+    if (fd >= 0) {
+        struct fsxattr attr;
+        int rc = ioctl(fd, FS_IOC_FSGETXATTR, &attr);
+        if (rc == 0) {
+            *id = (uint32_t) attr.fsx_projid;
+            ret = 0;
+        } else {
+            MFU_LOG(MFU_LOG_ERR, "ioctl failed: `%s' errno=%d (%s)", path, errno, strerror(errno));
+        }
+        close(fd);
+    } else {
+        MFU_LOG(MFU_LOG_ERR, "open failed: `%s' errno=%d (%s)", path, errno, strerror(errno));
+    }
+    return ret;
+}
+
 /* TODO: move this somewhere or modify existing print_file */
 /* print information about a file given the index and rank (used in print_files) */
-static size_t print_file_text(mfu_flist flist, uint64_t idx, char* buffer, size_t bufsize)
+static size_t print_file_text(mfu_flist flist, uint64_t idx, const uint32_t* projids, char* buffer, size_t bufsize)
 {
     size_t numbytes = 0;
 
@@ -1640,9 +1664,11 @@ static size_t print_file_text(mfu_flist flist, uint64_t idx, char* buffer, size_
         const char* size_units;
         mfu_format_bytes(size, &size_tmp, &size_units);
 
-        numbytes = snprintf(buffer, bufsize, "%s %s %s %7.3f %3s %s %s\n",
+        uint32_t projid = projids[idx];
+
+        numbytes = snprintf(buffer, bufsize, "%s %s %s %7.3f %3s %s %lu %s\n",
             mode_format, username, groupname,
-            size_tmp, size_units, modify_s, file
+            size_tmp, size_units, modify_s, (unsigned long)projid, file
         );
     }
     else {
@@ -1690,12 +1716,23 @@ void mfu_flist_write_text(
         MFU_LOG(MFU_LOG_INFO, "Writing to output file: %s", name);
     }
 
-    /* compute size of buffer needed to hold all data */
-    size_t bufsize = 0;
     uint64_t idx;
     uint64_t size = mfu_flist_size(flist);
+
+    /* lookup file project id values */
+    uint32_t* projids = (uint32_t*) MFU_MALLOC(size * sizeof(uint32_t));
+    for (idx = 0; idx < size; idx++) {
+        const char* file = mfu_flist_file_get_name(flist, idx);
+        uint32_t projid;
+        if (get_projid(file, &projid) == 0) {
+            projids[idx] = projid;
+        }
+    }
+
+    /* compute size of buffer needed to hold all data */
+    size_t bufsize = 0;
     for (idx = 0; idx < size; idx++) {
-        size_t count = print_file_text(flist, idx, NULL, 0);
+        size_t count = print_file_text(flist, idx, projids, NULL, 0);
         bufsize += count + 1;
     }
 
@@ -1706,7 +1743,7 @@ void mfu_flist_write_text(
     char* ptr = buf;
     size_t total = 0;
     for (idx = 0; idx < size; idx++) {
-        size_t count = print_file_text(flist, idx, ptr, bufsize - total);
+        size_t count = print_file_text(flist, idx, projids, ptr, bufsize - total);
         total += count;
         ptr += count;
     }
@@ -1811,6 +1848,9 @@ void mfu_flist_write_text(
     /* free buffer */
     mfu_free(&buf);
 
+    /* free buffer of project ids */
+    mfu_free(&projids);
+
     /* end timer */
     double end_write = MPI_Wtime();
 

I can push up a branch if you'd like.

This code doesn't properly handle the case when get_projid() returns an error. It should at least print an error message to stdout, but it will use an undefined value as the file's project id. To better handle that error case, we should also be tracking whether each entry in the projids array is actually valid.

Perhaps change this to use a signed datatype and then use -1 to represent an invalid project id? Or otherwise use a second array or an array of structs or something to denote whether the corresponding entry is valid...

Also, when the project id has not been explicitly set (or inherited) on a file, does the project id default to 0? Anyone know?

@adammoody
Copy link
Member

Went ahead and pushed this a branch to make it easier to try:

https://github.com/hpc/mpifileutils/tree/projid

and here are the actual changes:

main...projid

@markmoe19
Copy link
Author

That works! I get a lot of projid='0', but project id is not set for all the files so that makes sense. I do get 10001, 10002, and 20034 as some other example project-ids.

@markmoe19
Copy link
Author

questions:

  1. what happens when the .mfu file doesn't contain the projid data? Or is that not part of the .mfu file yet? Since projid might be a new addition to the .mfu file, could it just be assumed to be value "0" for any .mfu files that don't contain the project id?

  2. sometimes I get an error like this:

[2023-10-09T12:45:00] [44] [/project/selene-admin/mpifileutils_debug/mpifileutils-v0.11.1/mpifileutils/src/common/mfu_flist_io.c:1613] ERROR: ioctl failed: `/lustre/fsw/.../system/rc.service' errno=25 (Inappropriate ioctl for device)

I noticed that in those cases, "lsattr -p" also gives an error like this:

root# lsattr -p /lustre/fsw/.../system/rc.service
lsattr: Operation not supported While reading flags on /lustre/fsw/.../system/rc.service
root#

Could that error just be ignored and also end up with project id value of '0'?

Thanks,

  • Mark

@adammoody
Copy link
Member

Right, we don't have a field for the project id within the internal flist structure that gets stored to the binary .mfu file. This means that it's not recording the project id value within the binary file.

The above hack can still work when reading a binary .mfu file as input, since it executes the project id lookup at the point it writes the text version of the list. However, you'd need to run from a system that has access to the file system(s) that were walked when generating the original .mfu file.

Adding the project id to the flist element would take some effort, because that structure needs to be updated in multiple places throughout the code base. It's also a bit disruptive since it forces a change to the .mfu file format.

As a future feature, I'd like to see whether we can enable users to attach arbitrary data to each file element. This data would need to be serializable to be stored to a binary .mfu file and/or sent between MPI processes. One could then use this user-defined value to store the project id, for example, or any other bit of additional data. However, that similarly requires some extended development effort.

Regarding the ioctl error, yes, that could be changed to use 0 with a one line fix.

projids[idx] = 0; // <--- initialize to 0, stays as 0 if get_projid returns an error
if (get_projid(file, &projid) == 0) {
    projids[idx] = projid;
}

However, since 0 seems to be the default project id for many files, I thought it might be useful to use another value to represent those error cases. For example, if we change the datatype from uint32_t to int32_t, we could initialize to -1 instead of 0. However, that assumes that one doesn't use project ids above what an int32_t can represent (int32 overflow).

For that matter, we could leave the type as uint32_t and use the max uint32_t value assuming that one is not using that as a valid project id.

To really handle that cleanly, it might require using a second field to indicate whether the corresponding project id is valid.

It's your call on which value you want to initialize here.

@adilger
Copy link
Contributor

adilger commented Oct 10, 2023

The projid needs to be a uint32 field, but the -1 value is invalid (there is a named constant for this, like PROJID_INVALID or something).

I suspect from the name that the .../system/rc.service file is a symlink to a file on a filesystem without projid support?

I think it would be most useful to not store a projid for such cases, rather than trying to store an invalid projid for the file.

@markmoe19
Copy link
Author

Right, it was a link to a file on an NFS file system. Probably storing nothing for projid in that case is the best response, I agree.

@markmoe19
Copy link
Author

I'm a bit torn on how to handle projid when using .mfu file. We have kept some .mfu files around as it provides a sort of snap-shot of the metadata that can be queried without the filesystem around (or even if the filesystem is there, it is no longer the same as some time as passed). Maybe, until the .mfu file can store the projid, the only valid way to get projid in the text output file is if you are going directly from file system scan to text file (and not keeping a .mfu file)? The good news is that now I have a way to capture project-id for all the files quickly, thank you! :)

@adilger
Copy link
Contributor

adilger commented Aug 22, 2024

@adammoody I saw your patch on this issue from last year, and am wondering if you had made any progress toward landing this feature? Is it a hard requirement that there is a field in the .mfu file even for dcp/dsync to copy between existing filesystems, or only for saving the data to a backup?

For Lustre we also added an xattr interface to save/restore the projid with a virtual "trusted.projid" file that allows backup and restore of the project ID for applications that don't support it natively, but this is not exposed on the client by default.

KoyamaSohei added a commit to KoyamaSohei/mpifileutils that referenced this issue Oct 30, 2024
When copying files using dcp or dsync, extended attributes
such as project id are also copied.

Resolves hpc#553.

Signed-off-by: Sohei Koyama <skoyama@ddn.com>
KoyamaSohei added a commit to KoyamaSohei/mpifileutils that referenced this issue Oct 30, 2024
When copying files using dcp or dsync, extended attributes such as
project id are also copied.

Resolves hpc#553.

Signed-off-by: Sohei Koyama <skoyama@ddn.com>
@KoyamaSohei KoyamaSohei linked a pull request Oct 30, 2024 that will close this issue
KoyamaSohei added a commit to KoyamaSohei/mpifileutils that referenced this issue Oct 30, 2024
When copying files using dcp or dsync, extended attributes such as
project id are also copied.

Resolves hpc#553.

Signed-off-by: Sohei Koyama <skoyama@ddn.com>
KoyamaSohei added a commit to KoyamaSohei/mpifileutils that referenced this issue Oct 30, 2024
When copying files using dcp or dsync, extended attributes such as
project id are also copied.

Resolves hpc#553.

Signed-off-by: Sohei Koyama <skoyama@ddn.com>
KoyamaSohei added a commit to KoyamaSohei/mpifileutils that referenced this issue Oct 30, 2024
When copying files using dcp or dsync, extended attributes such as
project id are also copied.

Resolves hpc#553.

Signed-off-by: Sohei Koyama <skoyama@ddn.com>
@sihara
Copy link

sihara commented Oct 30, 2024

@adammoody it would be great if you can review PR that Sohei pushed.

@KoyamaSohei
Copy link

@ofaaland Could you please review it?

@daltonbohning daltonbohning assigned ofaaland and unassigned ofaaland Nov 13, 2024
@ofaaland
Copy link
Collaborator

@adilger @sihara PR #594 proposes to copy all of struct fsxattr from a source file to a target file, when doing dcp, dsync, etc. Some of the fields and flags look to me like they may have file-system-specific values. Do you know if copying the entire struct fsxattr is safe and appropriate? Thanks!

@adilger
Copy link
Contributor

adilger commented Nov 15, 2024

Lustre only understands/uses the project ID in this struct, and possibly some flags. However, I couldn't say for sure that the other fields are zeroed or otherwise set properly...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants