Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using git-lfs instead of submodules? #108

Open
ghost opened this issue May 27, 2022 · 6 comments
Open

Using git-lfs instead of submodules? #108

ghost opened this issue May 27, 2022 · 6 comments
Labels

Comments

@ghost
Copy link

ghost commented May 27, 2022

Currently, game data is split in multiple repositories, because of the huge size of 3D models and sounds, which makes it very slow to clone or sync repos and hits github's size limitations for repositories.

This makes cleaning up things completely harder to do, for example, UnvanquishedAssets/res-buildables_src.dpkdir#8 is incomplete, if the intent was to do complete cleanup, then it would have missed files in:

There is also the fact that each merged PR in a submodule will trigger several "submodule syncing" commits in other repos, which is really just noise in history.

I was wondering if it would not be a better solution to instead use git lfs which, according to the documentation, requires those steps:

  1. configure git locally (once per machine) to install lfs
  2. configure the repository to send only specific files to the lfs thing
  3. "There is no step three"

I am certain there are problems this solution would give, but I have not seen any discussion about it in past issues nor remember seeing IRC ones.
This would not fix the submodules problem since I think:

  • the game data should be kept separated from the game logic repo,
  • I would keep maps with their own repos too (maybe it would also be desirable to have maps moved in a single repos, I'm really not sure on this one, this would at least require creating subfolders in such a "map-officials" repo),

but I think it would make things easier:

  • less PRs to do to change stuff like touching a building to contribute,
  • less noisy (less "sync submodule" entries in git logs)

If this is a bad idea, then at least this discussion should make it explicit why. @illwieckz is probably the first concerned by such a change though.

@ghost ghost added the question label May 27, 2022
@illwieckz
Copy link
Member

Note that res-legacy is a special case and it is meant to live in a separate repository with the only other option being to delete it, so even if we were using git-lfs we would keep it separate.

Using submodules also allows to handle each package as a project, each one can have its own version, and having separate modules makes per-package version computation easy. You quote maps, those are good example indeed. Texture sets also look to live well as separate repositories.

The size of a “big git repo” even without lfs isn't a big problem as we can migrate to gitlab that allows large repositories (but then do we have time to?).

In the past I did a large res-models package then I split it as res-buildables, res-players, res-weapons but this is just a matter of taste. Of course splitting it made it easier for GitHub.

less PRs to do to change stuff like touching a building to contribute

This is basically why I first put configs in res-buildables, res-players and res-weapons next to their models, textures and sounds, but then we moved them to unvanquished to have them in a single place. 😅️ No solutions seems to be perfect, even data in LFS would not solve the problem of unvanquished dpkdir not being part of Unvanquished code repository for example.

@ghost
Copy link
Author

ghost commented May 28, 2022

The size of a “big git repo” even without lfs isn't a big problem

I disagree. I remember clearly the pain it was for me to clone big repos without fiber connections, since git does not know how to handle interrupted clones. And I know many people in the world still don't have fiber connections (or have unreliable energy network, and thus unreliable internet connections even with fiber).

The size of a “big git repo” even without lfs isn't a big problem as we can migrate to gitlab that allows large repositories (but then do we have time to?).

Gitlab official instance is still 50% proprietary. The one thing it would help with is the many repo organisation, which does not really seems worth the effort to me.

even data in LFS would not solve the problem of unvanquished dpkdir not being part of Unvanquished code repository for example

Since I never wrote it down in a persistent place, let's do it now.

Unvanquished_src.dpkdir contains several parts which are, in practice, code, which is tightly coupled with the C++ side:

  • behavior trees
  • lua/rml files
  • beacons
  • weapons, missiles, classes, upgrades, buildables configuration files (note that the BBoxes are still outside of unvanquished_src.dpkdir are are unrelated[¹] to graphics)

Several of the changes I did in BTs required both C++ and BT changes (and I still want a way to have more stuff in there, like allowing bots to rotate on themselves when healing, split the heal action in two, etc). The RMLui transition also required updating things in 2 repos, if someone wanted to get some actual consistency in the SMG/rifle naming, same, multiple repo editing. To add or refactor HUD elements (walk/run/sprint in 8346bb466 for example, but also visual indication of being slowed down, blocked, burnt by acid/fire, etc that I really want to get done someday) also requires multiple PRs...
I say, multiple PRs, but this comes with all the burden obviously: more commits in local repos which have to be synced while implementing/testing changes, again have to be synced while rebasing to address peer review elements, and even had (past, thanks!) to be synced after a merge.

You invoke «allows to handle each package as a project, each one can have its own version, and having separate modules makes per-package version computation easy» and I agree this makes sense for some content (maps) but for core data like audiovisual of players, buildings or GUI/HUD (which in a game usually need to cope with a unified style (at least per faction)), not really.
As you're likely aware considering what you said on the topic of the res-legacy pack.

To have the audiovisual feeling split from the actual gameplay rules also makes more sense[¹] than to have gameplay rules split in 5 repos (C++, gameconfig, buildables, missiles, players... I think I have not missed any?).

Now that I talked about those data which are tightly coupled, one needs to mention that icons are now tightly coupled with those, too. So they should go in the same repo, with the exception, I guess, of the legacy ones.

Overall, I think it's not a good idea to have that many packages. Imo, what would make sense is:

  • unvanquished: C++ code, BT, gameplay config, and everything needed to build the GUI/HUD
  • unvanquished-data: core gameplay: players, buildings, weapons, missiles... artworks: sounds, models, textures
  • other stuff: untouched

No solutions seems to be perfect

By perfection's nature, nothing is ever perfect.

1: BBoxes are lightly coupled with the model's size and scale, as the BSuit's headshot bug of 0.51 shown

@illwieckz
Copy link
Member

illwieckz commented May 28, 2022

Since I never wrote it down in a persistent place, let's do it now.
Unvanquished_src.dpkdir contains several parts which are, in practice, code, which is tightly coupled with the C++ side

Yep, and more than that, the unvanquished dpkdir is a bastard package that also ships file tightly coupled with the engine (not Unvanquished). In the past it also shipped things for the editor (and it may still be needed as a package in some ways).

Anyway git lfs does not provide any solution for the specific case of the unvanquished dpkdir.

I know things may still be rough and more comfort may be appreciable, but I want to remind something important :

Being able to complain that this can be better is a luxury, I'm not saying it negatively, I mean the point we reached is very good.

Let's give a simple example: being able to complain that one may have to do two pull requests to implement one change means he can actually submit changes as pull requests, which is a luxury.

  • Everything is tracked in git repositories.
  • We can build everything with tools.
  • We have a workflow.

Reaching this may have required 5 years of hard work. I would speak for myself, but while I see things that can be done better, I appreciate what we already have. We actually have solutions for storage, edition, contribution, version control, build and packaging on every part of the game, sometime it requires to do multiple PR but what a luxury it is!

Previously all we had was packages with prebuilt assets done by hand. Half of things from unvanquished dpk were never tracked in any repositories. Source files for assets were disseminated everywhere. I actually wrote a web crawler to suck up every assets from every link posted on the forum and on the chat in the past 6 or 7 years. Repositories were created for every package (before that only a couple of maps had a repository, and maybe only one had a makefile to build it with the required options), maps, textures and models repositories were created from scratch, history for them was redone from scratch. For example you can look at this repository's history around 2017, starting from 1c16bf3, for buildables, weapons and assets you can look around 2016 and commit named add readme, older commits are from Unvanquished, more recent commits were redone from things not stored in Unvanquished neither anywhere else, until we started to use those repositories for newer stuff. You may have a look at map-forlorn and tex-ex repository histories for example. And among all the assets, a huge cleanup had to be done since nothing was tracking what was removed.

NetRadiant was edited to fit a workflow, q3map2 was edited to fit a workflow, daemonmap was edited to fit a workflow, sloth was edited to fit a workflow, iqm was edited to fit a workflow, urcheon was written to fit a workflow. So in the end sometime we have to make multiple PRs, but that means we have repositories and a workflow.

Some people complain we do “engine, engine, engine”, haven't we also spent 7 years on “workflow, workflow, workflow”? Sure we will still improve it incrementally, like when we moved config files from buildables/players/weapons to unvanquished because it appeared it was easier to track them there (and we needed experience to make that choice), but redoing the asset layout and potentially deeply changing the asset workflow may be too much. As a simple example, if we migrate to a large repository, then we cannot compute a version per package based on the git history, this is not a big problem, but we may have to write some new versioning code in tools like Urcheon: changes like this imply other changes in cascade. Tools have to be edited to fit the new workflow, etc.

Right now I only see a real benefit in git lfs in a way it can be a solution for having a big model repository merging players, buildables and weapons while not facing standard git repo limitations.

Things like maps and textures packages won't benefit that much from a large repo, neither git lfs.

The unvanquished dpkdir problem is not a problem for git lfs to solve, it is a separate repository for the purpose of being buildable by urcheon (a subfolder of Unvanquished may do it) and its root be clonable in UnvanquishedAssets/src (a subfolder of Unvanquished cannot do it) and for us to be able to add a single -pakpath to the engine (or the editor, or the compiler) for all the packages including unvanquished one (a subfolder of Unvanquished cannot do it).

So I really understand that things are still rough in some corners, but wow, can we just appreciate what we have without doing huge changes like merging repositories, switching to lfs, editing tools accordingly, etc? We already have a lot of solutions for most of our problems, even if they may be rough they exist and are usable.

@ghost
Copy link
Author

ghost commented May 28, 2022

To add some context, see #1352 (I'm just placing this as a reference, since I'm browsing issues right now to do some cheap maintenance)

@illwieckz
Copy link
Member

You probably meant Unvanquished/Unvanquished#1352

That makes me think that what I like in GitLab is that you can have groups of repositories and a single issue parent list for all repositories.

@ghost
Copy link
Author

ghost commented May 28, 2022

Dang... yes, it's what I meant. Thanks.
Yes, I would like a real project-oriented forge, too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant