Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create mechanism for querying Nuxeo DB directly rather than using API #12

Open
barbarahui opened this issue Oct 25, 2024 · 1 comment
Open
Assignees

Comments

@barbarahui
Copy link
Contributor

We have discovered yet another bug w/r/t to the Nuxeo API. This query doesn’t actually produce results ordered by lastModified:

    query = (
            "SELECT * FROM SampleCustomPicture, CustomFile, CustomVideo, CustomAudio, CustomThreeD "
            f"WHERE ecm:ancestorId = '{collection['uid']}' AND "
            "ecm:isVersion = 0 AND "
            "ecm:isTrashed = 0 "
            "ORDER BY lastModified desc"
        )

The results are completely out of order when it comes to lastModified, and are in the same order as when you leave off the ORDER BY clause entirely. I've tried different API endpoints and also querying on the path rather than the UID, and get the same results. This happens for small collections with fewer than 100 records.

Given the unreliability of the API general (it doesn't reliably return the same results set for very large collections), I think we should just bypass the API altogether and figure out how to query the DB directly.

It’ll take a little work to figure out the schema and also it’s a bit of extra infrastructure work because the DB is locked down in a VPN in the pad-dsc account. The nuxeo-merritt job runs in the pad-prd account because that’s where Airflow is. I’m not sure how to go about giving cross-account access to the DB, but I’m hoping/guessing it can be done.

@barbarahui barbarahui self-assigned this Oct 25, 2024
@christinklez
Copy link

Barbara will ask IAS about the possibility of cross-account database sharing.
Can also ask IAS if it's possible to move the Nuxeo data s3 bucket?

@barbarahui barbarahui changed the title Update nuxeo-merritt feed to query the DB directly rather than using the API Update nuxeo-merritt feed and rikolti nuxeo fetcher to query the DB directly rather than using the API Oct 28, 2024
@barbarahui barbarahui changed the title Update nuxeo-merritt feed and rikolti nuxeo fetcher to query the DB directly rather than using the API Create mechanism for querying Nuxeo DB directly rather than using API Oct 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants