Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store checksum information as xattr on hdfs #25

Open
juztas opened this issue Feb 26, 2020 · 2 comments
Open

Store checksum information as xattr on hdfs #25

juztas opened this issue Feb 26, 2020 · 2 comments

Comments

@juztas
Copy link

juztas commented Feb 26, 2020

Since 2.5.0 release, Hadoop supports xattr and it could set the checksum values as xattr and not files under /cksums dir as it is right now.

@PerilousApricot
Copy link

Like @bbockelm mentioned -- you can't access xattr from the libhdfs C library, even in the latest trunk, so it will be difficult to access it from this plugin (see https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/include/hdfs/hdfs.h)

@kreczko
Copy link

kreczko commented Jul 6, 2022

While extended attributes are not available in libhdfs C library, XrootD allows now for drop-in checksum plugins.
I've written such a plugin in Python, which stores the checksum results in the extended attributes. It is currently under test.
It heavily borrows from cephsum plugin.

To try it out, you will need Python >=3.8:

pip install xrdsum[hdfs]

Usage example:

/usr/bin/time -v xrdsum --verbose  --debug get  <HDFS path to file> --read-size 128

xrootd config

# ensure cksum adler32 is included in the tpc directive, in order to caclulate by default on transfer
ofs.tpc cksum adler32 fcreds ?gsi =X509_USER_PROXY autorm xfr 40 pgm /etc/xrootd/xrdcp-tpc.sh

# add this line to trigger external checksum calculation. Would be overwritten by other xrootd.chksum lines
xrootd.chksum max 50 adler32 /etc/xrootd/xrdsum.sh

with /etc/xrootd/xrdcp-tpc.sh containing:

#!/bin/sh

# from https://github.com/snafus/cephsum/blob/master/scripts/xrdcp-tpc.sh
#Original code
#/usr/bin/xrdcp --server -f $1 root://$XRDXROOTD_PROXY/$2

# Get the last two variables as SRC and DST, all others are assumed as additional arguments
OTHERARGS="${@:1:$#-2}"
DSTFILE="${@:$#:1}"
SRCFILE="${@:$#-1:1}"


/usr/bin/xrdcp $OTHERARGS --server -f $SRCFILE root://$XRDXROOTD_PROXY/$DSTFILE

and with /etc/xrootd/xrdsum.sh containing:

#!/usr/bin/env bash

RESULT=$(xrdsum get --store-result --chunk-size 64 --verbose --storage-catalog /etc/xrootd/storage.xml "$1")
ECODE=$?

# XRootD expects return on stdout - checksum followed by a new line
printf "%s\n" "$RESULT"
exit "$ECODE"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants