Research Notes
References
Open Questions
- Planning - Should we squash history? For how long? >>> Less data to migrate
- Planning - What extensions to migrate to LFS
- Planning - What branches can we remove before the migration (fewer branches makes it a lot easier)
- Planning - We need a procedure for devs to migrate their forks and branches
- master is easy
git rebase --onto upstream/master
- rebase-type branches
git rebase --onto upstream/master <last master rebased commit hash>
- merge-type branches
git rebase --onto upstream/master <last master ??? commit hash, feels like this can be the branch-off point?>
- branches messing with LFS stored files >> good question
- master is easy
- Planning - What efforts should we finish merging before trying to do this
- Planning - We need a proper mirrored repo as full back up
- Planning - Highlight all PRs get automatically closed, this is pretty impactful...
- GIT - Is there a way to find out what the original master commit the branch was based on (or the last master commit it branches of from after the last rebase) > There should be, needs some testing
- GIT - Can LFS still verify but not lock? There's the
git.lfs.URL.lockverify false
setting, but that disables both locking and verifying it seems. I'd like to avoid having to push every single file on first push of a rebased master of fork... lots of bandwidth... - Worker - Do we need unique users?
Useful Commands
File extensions with files larger than 1MB
> find . -type f -size +1M -exec ls -l {} + | grep -v '.git' | sort -rk 5,5 | wc -l
204
> find . -type f -size +1M -exec ls -l {} + | grep -v '.git' | sort -rk 5,5 | awk -F'.' '{print $NF}' | sort | uniq -c | sort -r
155 DDS
14 bin
12 otf
8 dds
5 gltf
2 psd
2 json
2 PCK
1 sfd
1 js
1 TIF
1 PNG
Files matching most binary formats
> find . -type f \( -iname '*.DDS' -o -iname '*.dds' -o -iname '*.bin' -o -iname '*.gltf' -o -iname '*.psd' -o -iname '*.PCK' -o -iname '*.TIF' -o -iname '*.otf' \) ! -exec ls -l {} + | sort -rk 5,5 | wc -l
284
Scripts for testing
Script to mirror a large repo
GitHub only allows pushes of max 2GB in one go, which can be an issue with a large repository like the aircraft. Here's an example script that can mirror a large repo in chunks, still taking all branches
#!/bin/bash
# Original repository URL
ORIGIN_URL="[email protected]:flybywiresim/aircraft.git"
# New empty repository URL
NEW_REPO_URL="[email protected]:pdellaert/fbw-aircraft-git-lfs.git"
# Chunk size for commits
CHUNK_SIZE=200
# Main branch (make sure it is done first)
MAIN_BRANCH="master"
# Clone the original repository with all branches and all data
git clone --mirror $ORIGIN_URL cloned-repo
# Change into the cloned repository directory
cd cloned-repo
# Get all the branch names
branches=$(git branch -a | grep -v $MAIN_BRANCH | tr -d '* ')
# Loop through all the branches
for branch in $MAIN_BRANCH $branches; do
echo "Processing branch: $branch"
# Get the total number of commits for the branch
count=$(git rev-list --count $branch)
# Iterate through the commits in chunks, pushing them to the new repository
for ((i=0; i<$count; i+=$CHUNK_SIZE)); do
start=$(($i+1))
end=$(($i+$CHUNK_SIZE))
if [ $end -gt $count ]; then
end=$count
fi
echo "Pushing commits $start to $end"
# Get the commit hashes for the current chunk in chronological order
chunk_commits=$(git rev-list --reverse $branch | tail -n +$start | head -n $(($end-$i)))
# Push the current chunk of commits
git push $NEW_REPO_URL +$(echo "$chunk_commits" | tail -n 1):refs/heads/$branch
done
done
# Change out of the cloned repository directory
cd ..
# Remove the cloned repository
rm -rf cloned-repo
echo "Done mirroring repository"
Script to migrate a large repo to LFS
Migrates an existing large repository to LFS.
In one push per branch
#!/bin/bash
set -o errexit
# Variables
MIGRATE_REPO_URL="[email protected]:pdellaert/fbw-aircraft-git-lfs.git"
CUSTOM_LFS_SERVER="https://git-lfs.fbw.straks.dev/aircraft.git"
EXTENSIONS=("*.bin" "*.bnk" "*.dds" "*.DDS" "*.gltf" "*.otf" "*.PCK" "*.PNG" "*.psd" "*.TIF" "*.wav")
CHUNK_SIZE=200
MAIN_BRANCH="master"
# 1. Clone the original repository (regular clone)
git clone $MIGRATE_REPO_URL migrate-repo # Clone the original repository into a directory called "migrate-repo"
cd migrate-repo # Change directory into the cloned repository
# 2. Making sure to capture all the branches
for branch in $(git branch -r | grep -v HEAD | grep -v $MAIN_BRANCH); do # Loop through each remote branch, excluding HEAD
echo "${branch}"
git checkout --track $branch # Check out the remote branch locally and track it
git pull
done
git checkout ${MAIN_BRANCH}
git pull
# 2. Set up Git LFS
git lfs install # Initialize Git LFS in the repository
git config lfs.url $CUSTOM_LFS_SERVER # Set the custom LFS server URL
#git config lfs.$CUSTOM_LFS_SERVER.locksverify false
git lfs migrate import --everything --include="$(IFS=,; echo "${EXTENSIONS[*]}")" # Migrate files to LFS for the branch
# 3. Migrate files to LFS for all branches and push in chunks
for branch in $MAIN_BRANCH $(git branch -r | grep -v HEAD | grep -v $MAIN_BRANCH); do # Loop through each remote branch, excluding HEAD
echo "${branch}"
git checkout ${branch#origin/} # Check out the remote branch locally and track it
git push --force origin ${branch#origin/}
done
In chunks in case the repo is still too large
#!/bin/bash
set -o errexit
# Variables
MIGRATE_REPO_URL="[email protected]:pdellaert/fbw-aircraft-git-lfs.git"
CUSTOM_LFS_SERVER="https://git-lfs.fbw.straks.dev/aircraft.git"
EXTENSIONS=("*.bin" "*.bnk" "*.dds" "*.DDS" "*.gltf" "*.otf" "*.PCK" "*.PNG" "*.psd" "*.TIF" "*.wav")
CHUNK_SIZE=200
MAIN_BRANCH="master"
# 1. Clone the original repository (regular clone)
git clone $MIGRATE_REPO_URL migrate-repo # Clone the original repository into a directory called "migrate-repo"
cd migrate-repo # Change directory into the cloned repository
# 2. Making sure to capture all the branches
for branch in $(git branch -r | grep -v HEAD | grep -v $MAIN_BRANCH); do # Loop through each remote branch, excluding HEAD
echo "${branch}"
git checkout --track $branch # Check out the remote branch locally and track it
git pull
done
git checkout ${MAIN_BRANCH}
git pull
# 2. Set up Git LFS
git lfs install # Initialize Git LFS in the repository
git config lfs.url $CUSTOM_LFS_SERVER # Set the custom LFS server URL
git config lfs.$CUSTOM_LFS_SERVER.locksverify false
git lfs migrate import --everything --include="$(IFS=,; echo "${EXTENSIONS[*]}")" # Migrate files to LFS for the branch
# 3. Migrate files to LFS for all branches and push in chunks
for branch in $MAIN_BRANCH $(git branch -r | grep -v HEAD | grep -v $MAIN_BRANCH); do # Loop through each remote branch, excluding HEAD
echo "${branch}"
git checkout ${branch#origin/} # Check out the remote branch locally and track it
# Determine chunks
TOTAL_COMMITS=$(git rev-list --count HEAD) # Count total number of commits in the branch
CHUNKS=$((TOTAL_COMMITS / CHUNK_SIZE + 1)) # Calculate the number of chunks
# Push in chunks
for i in $(seq $((CHUNKS - 1)) -1 0); do # Loop through each chunk
COMMIT_HASH=$(git rev-list --reverse --max-count=$CHUNK_SIZE --skip=$((i * CHUNK_SIZE)) HEAD | tail -n 1) # Get the commit hash for the chunk
git push --force origin $COMMIT_HASH:${branch#origin/} # Push the chunk to the new repository, removing the "origin/" prefix from the branch name
done
done