Change remote globally in Git history #5935
Replies: 23 comments 6 replies
-
@dmpetrov This is mostly about properly maintaining your project. E.g. using tags or branches for released versions, which can be easilly updated to point to the new bucket. |
Beta Was this translation helpful? Give feedback.
-
E.g. say some project died and closed its remote, but someone has the cache somwhere. In that case, you fork it and put your own remote and rely on it. Or, you could specify |
Beta Was this translation helpful? Give feedback.
-
@efiop could you please elaborate? How should the team properly maintain the project to minimize this amount of work? Use case: A team has a long-running project (year+), hundreds of commits, dozens of releases (with tags). All Git history is important. Some clients might still use old ML models, all changes in data sources have to be fully tracked from the inception. Not released commits (without tags) might be also important (we decided to release an old but a promising experiment). Suddenly the team decided to change a cloud provider. It looks like a tremendous amount of work has to be done to make it happen. It would be great if DVC can make it in a few commands. |
Beta Was this translation helpful? Give feedback.
-
Oops, sorry for the delay.
Well, giving that to your customers was a bad idea from beginning 🙂 So say you have some tag like v1 that was using s3. Now you are migrating to gs, so what you do is you go to v1, create a branch from it, adjust the remote to point to the new gs location, commit, move v1 to this new commit. Then you'll have to make your users update. But if you, as a maintainer, would take it more seriously earlier, you would create some kind of proxy remote. E.g. http remote (like we do in dvc core project) that you will be able to trivially switch from s3 to gs without needing to adjust anything in the projects themselves. |
Beta Was this translation helpful? Give feedback.
-
@efiop these are good workarounds. I urge everyone to think about a holistic solution instead. Ideally, we need the same solution as Git has - I can easily move a repo with the entire history from, for example, GitHub to GitLab by a couple of commands just by changing remote, pushing and removing the old one. Is there a way to implement something similar in DVC? |
Beta Was this translation helpful? Give feedback.
-
@dmpetrov , @efiop , what about using external tools like the AWS CLI or the Google Cloud Platform CLI to sync the cache? For example, if you are migrating to GCP, it would be |
Beta Was this translation helpful? Give feedback.
-
If you don't care about modifying your Git history, |
Beta Was this translation helpful? Give feedback.
-
@MrOutis sure, you can use the tools. But it is not clear how to change the links in Git history - you will still have Modifying the commit - yes, it is a possibility. Is there any better way to define and change data remotes? (most likely it should not be committed to config) |
Beta Was this translation helpful? Give feedback.
-
👍 on my end, it feels that there should be a better solution to this. |
Beta Was this translation helpful? Give feedback.
-
Sure:
Maybe I don't quite understand how you propose to "modify git history". The workarounds I've provided initially update the branches/tags that are used or propose to use some type of proxy to route your |
Beta Was this translation helpful? Give feedback.
-
How about we always rely on the latest commit in a branch to determine the actual remote? No matter what is committed in the history. |
Beta Was this translation helpful? Give feedback.
-
@shcheklein sounds fragile and non-obvious. Plus it again won't work until the user |
Beta Was this translation helpful? Give feedback.
-
@efiop The code you provided looks like another workaround, not a holistic solution for changing remotes "globally". The problem - if a user checks out an old revision (clones or imports @shcheklein thank you! It is definitely a global solution that might work. There are some issues with this approach (thanks @efiop to pointing to this) but at least we have something to consider or/and improve. |
Beta Was this translation helpful? Give feedback.
-
It looks like you simply want to rewrite the history for config file. EDIT. Another option is completely separating config from git history, which we already support in the form of global/user/local configs. |
Beta Was this translation helpful? Give feedback.
-
@efiop can you please elaborate or point to the details of proxy-remote implementation? Also to summarize possible solutions I see in the thread as I am facing the same inconvenience (I want to
|
Beta Was this translation helpful? Give feedback.
-
Hi @dimitry12, for this option, would |
Beta Was this translation helpful? Give feedback.
-
@dberenbaum yes local config will also work, as long as you manage it properly, i.e. update it over all your working copies. |
Beta Was this translation helpful? Give feedback.
-
Thanks, @Suor! I'm wondering if this issue can be closed then since it seems that the introduction of the local config makes changing remotes globally possible and on par with Git. |
Beta Was this translation helpful? Give feedback.
-
Local config works for me. |
Beta Was this translation helpful? Give feedback.
-
@dberenbaum my 2cs on this: local config can be a good temporary solution, but it breaks a bit the point of repos being self descriptive/self contained. Why it can be important? For two reasons (and may be I'm missing something else):
|
Beta Was this translation helpful? Give feedback.
-
See discussion at https://discord.com/channels/485586884165107732/563406153334128681/818978209944961045 and https://discord.com/channels/485586884165107732/563406153334128681/819212764282748928 |
Beta Was this translation helpful? Give feedback.
-
Another argument for why local config is insufficient: for data registry repos where the data is being fetched from outside the project via |
Beta Was this translation helpful? Give feedback.
-
Is there a good solution for this? Does the If I have a series of commits with one remote, but then switch to another remote, I thought the following would work to transfer everything to the new remote: git checkout <COMMIT_WITH_OLD_REMOTE>
dvc fetch -A
git checkout <COMMIT_WITH_NEW_REMOTE>
dvc push -A -r new-remote It does not work as I would've expected. Instead, I get this:
|
Beta Was this translation helpful? Give feedback.
-
Changing buckets and cloud easily for a single project is a very compelling feature.
But in fact, when I need to transfer a data-remote from one bucket to another (or another cloud) I can do that properly only on HEAD of the repo. All the old commit will have old remote (in
.dvc/config
). As a result, when I checkout back in Git history an old remote will be used.So, I need to keep old data remote (bucket) or I'll have troubles using my old Git commits.
Is it possible to make remote settings "global"? A single remote change should change it everywhere in the Git history. How Git does that and can it work for DVC? Are there other options?
All ideas are welcome!
Beta Was this translation helpful? Give feedback.
All reactions