File path is only filename after ingestion using the REST Gateway #1418
Unanswered
Bradfordio
asked this question in
Q&A
Replies: 1 comment
-
That's interesting. I did not test it in debug mode but I'm wondering what FSCrawler is actually getting when we call it with If we are able to get the full information in FSCrawler, then that's a bug. Could you confirm that the workaround could work for you? |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hello,
I'm relatively new to fscrawler so apologies if this is a very basic mistake I'm making on my end.
We are currently attempting to ingest millions of files of all different filetypes stored on an S3 bucket into ES. To accomplish this we fscrawler running as an REST gateway on a RH Ec2 instance. We also have a python script which does the following:
Reads from an SQS queue where the messages are an object key.
Reads the object key and if there are any prefixes (folders in S3) it then makes the same directory structure on the ec2 instance:
Downloads the file with the correct filename to that location
Uploads the file to ES index via curl command
This process does work for us and we have successfully ingested many files. However one thing we have noticed which is vital for us is that the path fields for both virtual and real are only returning with just the filename:
Can anyone please help me to understand why the linux file path is missing?
Config:
fscrawler mappings are the default:
https://github.com/dadoonet/fscrawler/blob/master/settings/src/main/resources/fr/pilato/elasticsearch/crawler/fs/_default/7/_settings.json
fscrawler settings:
Beta Was this translation helpful? Give feedback.
All reactions