Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

consider mounting data folder as sub-directory of pacta-data/ #63

Open
cjyetman opened this issue Feb 26, 2024 · 11 comments · Fixed by #72
Open

consider mounting data folder as sub-directory of pacta-data/ #63

cjyetman opened this issue Feb 26, 2024 · 11 comments · Fixed by #72
Labels
ADO Maintenance Day! priority

Comments

@cjyetman
Copy link
Member

cjyetman commented Feb 26, 2024

The current docker-compose.yml expects that you point to a local pacta-data/ directory which should have a sub-directory named for the desired timestamp/quarter, e.g. pacta-data/2021Q4. Given our current strategy of building PACTA data directories, I think it might be more natural to point directly to that sub-directory, since one might have a directory with numerous sub-directories each containing different builds/versions of PACTA data, but you really only want/need one, and it could be confusing which one is being used.

So instead of this in the docker-compose.yml

volumes:
- ${PACTA_DATA_PATH}:/pacta-data

and this in the .env

PACTA_DATA_PATH=PATH/TO/pacta-data
R_CONFIG_ACTIVE=2022Q4

we might have this is the docker-compose.yml

    volumes:
      - ${PACTA_DATA_PATH}:/pacta-data/2022Q4

and this in the .env

PACTA_DATA_PATH=PATH/TO/pacta-data/2022Q4_20240215
R_CONFIG_ACTIVE=2022Q4

The only problem is that the /pacta-data/2022Q4 in the docker-compose.yml would need to adapt to whatever timestamp/quarter you're targeting, and I'm not sure how best to resolve that.

AB#10133

@cjyetman cjyetman changed the title consider copying data folder as sub-directory of pacta-data/ consider mounting data folder as sub-directory of pacta-data/ Feb 27, 2024
@cjyetman cjyetman reopened this Feb 29, 2024
@cjyetman
Copy link
Member Author

Reopening because #87 reveals this is still not adequately fixed. If you have R_CONFIG_ACTIVE=PA2024CH in your .env, that will activate the PA2024CH config, but it will also set the path that the PACTA data is mounted to as /pacta-data/PA2024CH/, based on target: /pacta-data/${R_CONFIG_ACTIVE} in the docker-compose.yml

@jdhoffa
Copy link
Member

jdhoffa commented Feb 29, 2024

Hmm not ideal. I will open a hotfix to get it working again, and open an issue to track progress on designing a more appropriate solution.

@jdhoffa
Copy link
Member

jdhoffa commented Feb 29, 2024

Ah so it turns out that this issue is already the "follow-up issue" i was going to open haha. Totally agree with the strategy proposed here.

(I think @AlexAxthelm closed this issue before I had a chance to read it XD)

@AlexAxthelm
Copy link
Contributor

Hmm. I remember thinking when I saw that R_CONFIG_ACTIVE in a path that it would probably give us grief at some point. wasn't expecting it within a week.

@jdhoffa
Copy link
Member

jdhoffa commented Feb 29, 2024

😂 for real
I don't think such a big deal to solve though tbh

@cjyetman
Copy link
Member Author

cjyetman commented Mar 1, 2024

I think the logic is:

  • the target path inside the image needs to be parameterized in the docker-compose.yml to whatever is the relevant quarter
  • the docker-compose.yml only gets "input" from environment variables (although docker-compose CLI could likely take params as well)
  • a new/additional param in the .env is necessary, e.g. PACTA_DATA_TIMESTAMP, so that it can be distinguished from R_CONFIG_ACTIVE (which could = e.g. 2023Q4, but could also = e.g. PA2024CH)

@jdhoffa
Copy link
Member

jdhoffa commented Mar 1, 2024

The issue there is that it conflicts with the assumption that config.yml should set that information somehow

(e.g. it would be possible in that situation that PACTA_DATA_TIMESTAMP point to 2023Q4, R_CONFIG_ACTIVE points to PA2024CH BUT in config.yml that PA2024CH points to 2024Q4 or something like that)

seems like an unnecessary maintenance burden, so I am going to think it through more cautiously.

@cjyetman
Copy link
Member Author

cjyetman commented Mar 1, 2024

yeah, I guess that's what I meant... the PACTA data timestamp is necessary in the docker-compose step, and since that takes input from env vars not the config, the PACTA data timestamp kinda needs to be defined in the env vars not the config

@jdhoffa jdhoffa self-assigned this Mar 1, 2024
@jdhoffa jdhoffa added the ADO Maintenance Day! label Mar 1, 2024
@jdhoffa
Copy link
Member

jdhoffa commented Mar 1, 2024

I think one of the tricky parts is that workflow.transition.monitor expects a directory called 2023Q4 but pacta-data directories are now named e.g. 2023Q4_20240229T200630Z

So already, I need to mount in the contents of a directory into a newly named directory.
This is why I can't just mount 2023Q4_20240229T200630Z into pacta-data which I think would be the most preferable approach.

Thoughts?

@cjyetman
Copy link
Member Author

cjyetman commented Mar 1, 2024

I think one of the tricky parts is that workflow.transition.monitor expects a directory called 2023Q4 but pacta-data directories are now named e.g. 2023Q4_20240229T200630Z

yes, agree 💯, that is the crux of it

Though the additional thing that plays a role is, you may need the mounted directory to look like pacta-data/2022Q4 or pacta-data/2021Q4 depending on what quarter you run. That's a holdover from the olden days when we allowed a TM Docker image to be able to run multiple timestamps/quarters. Generally speaking, we're not doing that anymore, but the capability is still there. I think in the future, e.g. with workflow.pacta, that might not be an issue anymore, because I designed it to have the data outside the Docker image, so it can be mounted where/however you want and the input config determines where the code should look for it, e.g there's no strong restriction like pacta-data/2023Q4, it could be /2023Q4_20240229T200630Z as long as the config has "data_dir": "/2023Q4_20240229T200630Z"

@jdhoffa
Copy link
Member

jdhoffa commented Mar 1, 2024

Yup totally.
I think all of this points to prioritizing superseding it with workflow.pacta (i know you both agree, just saying it aloud).

I am tempted to not work much on this issue until after that is done/ after CH COP is delivered tbh (I don't want to jump through a bunch of hoops with an odd architecture if we have the opportunity to just update the architecture instead)

@jdhoffa jdhoffa removed their assignment Apr 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ADO Maintenance Day! priority
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants