Some fun with git - keeping a subdirectory of a repo separate without submodules
I ran into an interesting issue this past week. When I was searching around for solutions, I didn't come across any, so I wanted to share this here so I'll have documentation for the future and so that others might come across this if they need it. I have my thesis hosted in a private github repository (thanks to the awesome student package github offers). Unfortunately, my advisor wanted my code to be in our research group's repository, which is:
huge
not structured remotely like my current repo
full of things that I don't care about and don't need for my daily work (i.e. other peoples' research)
not hosted on a public facing server
On top of this, it has none of my history. I looked into doing a subtree merge to solve this, but I would still have the problem of my Github repo stagnating (as the research groups repo is several gigabytes and I believe they stop you from putting anything too huge on github) as well as the changes to directory structure and the fact it would no longer be public-facing (i want this because i could make it NOT private at some point, as well as do things like push and pull from home without a vpn).
The best way would probably be to use what's called a submodule, which lets you map a section of a repo to another repo and push/pull at will, but my advisor doesn't like this as it complicates the repo, and if my secondary repo is ever taken down (because I delete my git account or something) then the group repo may end up with an old version and no way to get the updated, now deleted, code. Another option is a subtree merge, which unfortunately would fully merge the repos and would result in my github filling with the huge group repo.
To solve this I thought about how git is structured, and came up with a solution: Pull from github into my local copy of the group's repo. I never push to github from the group's repo and continue working in my current setup, pulling into the group repo from time to time.
To set this up, I moved the contents of my git directory into another directory to avoid merge conflicts and to keep my files together when they were pulled into the group repo. To do this I created a new directory and moved the contents into the folder. This causes one of your commits to show a move history, which can be undesirable but I didn't care. To fix this (if it's undesirable) I found resources on git filter-branch, but I decided it really didn't matter to me.
My git directory was actually in a bad place, so I additionally had to pull in directories from the parent directory, before moving my entire repository folder up and deleting the original (Thesis/Assets/<repo here> moved to Thesis/<repo here>/Assets by moving the contents of assets into Thesis/Assets/Assets, renaming Assets to Thesis, pulling contents of Thesis into the new subdirectory Thesis, then finally replacing the parent with the child).
This then made it a simple task to fetch from github into the group's repo and merge into my branch there. When I want to update my code, I push to github from my working repo and pull from github into the group repo. Again, this mimics the behavior of submodules as I understand them, but in this case they weren't quite an option. I'll try to argue my case but for now this is it.
Some downsides include:
I have to be careful to only pull into the group repo or else I'll end up with more copies of the huge repo
I have to manually pull from github into the group repo
Same problems as submodules in that I have to make sure the group repo is up to date
It's a very straightforward workaround, but it took some time for me to puzzle out the details as I'm so set in thinking about remotes as 2 way all the time. Having a remote I only pull from is somewhat strange, but it does the job for now!











