Skip to content

Research Notes

References

Open Questions

  • Planning - Should we squash history? For how long? >>> Less data to migrate
  • Planning - What extensions to migrate to LFS
  • Planning - What branches can we remove before the migration (fewer branches makes it a lot easier)
  • Planning - We need a procedure for devs to migrate their forks and branches
    • master is easy git rebase --onto upstream/master
    • rebase-type branches git rebase --onto upstream/master <last master rebased commit hash>
    • merge-type branches git rebase --onto upstream/master <last master ??? commit hash, feels like this can be the branch-off point?>
    • branches messing with LFS stored files >> good question
  • Planning - What efforts should we finish merging before trying to do this
  • Planning - We need a proper mirrored repo as full back up
  • Planning - Highlight all PRs get automatically closed, this is pretty impactful...
  • GIT - Is there a way to find out what the original master commit the branch was based on (or the last master commit it branches of from after the last rebase) > There should be, needs some testing
  • GIT - Can LFS still verify but not lock? There's the git.lfs.URL.lockverify false setting, but that disables both locking and verifying it seems. I'd like to avoid having to push every single file on first push of a rebased master of fork... lots of bandwidth...
  • Worker - Do we need unique users?

Useful Commands

File extensions with files larger than 1MB

> find . -type f -size +1M -exec ls -l {} + | grep -v '.git' | sort -rk 5,5 | wc -l
     204

> find . -type f -size +1M -exec ls -l {} + | grep -v '.git' | sort -rk 5,5 | awk -F'.' '{print $NF}' | sort | uniq -c | sort -r
 155 DDS
  14 bin
  12 otf
   8 dds
   5 gltf
   2 psd
   2 json
   2 PCK
   1 sfd
   1 js
   1 TIF
   1 PNG

Files matching most binary formats

> find . -type f \( -iname '*.DDS' -o -iname '*.dds' -o -iname '*.bin' -o -iname '*.gltf' -o -iname '*.psd' -o -iname '*.PCK' -o -iname '*.TIF' -o -iname '*.otf' \) ! -exec ls -l {} +  | sort -rk 5,5 | wc -l
     284

Scripts for testing

Script to mirror a large repo

GitHub only allows pushes of max 2GB in one go, which can be an issue with a large repository like the aircraft. Here's an example script that can mirror a large repo in chunks, still taking all branches

#!/bin/bash

# Original repository URL
ORIGIN_URL="[email protected]:flybywiresim/aircraft.git"
# New empty repository URL
NEW_REPO_URL="[email protected]:pdellaert/fbw-aircraft-git-lfs.git"
# Chunk size for commits
CHUNK_SIZE=200
# Main branch (make sure it is done first)
MAIN_BRANCH="master"

# Clone the original repository with all branches and all data
git clone --mirror $ORIGIN_URL cloned-repo

# Change into the cloned repository directory
cd cloned-repo

# Get all the branch names
branches=$(git branch -a | grep -v $MAIN_BRANCH | tr -d '* ')

# Loop through all the branches
for branch in $MAIN_BRANCH $branches; do
  echo "Processing branch: $branch"

  # Get the total number of commits for the branch
  count=$(git rev-list --count $branch)

  # Iterate through the commits in chunks, pushing them to the new repository
  for ((i=0; i<$count; i+=$CHUNK_SIZE)); do
    start=$(($i+1))
    end=$(($i+$CHUNK_SIZE))
    if [ $end -gt $count ]; then
      end=$count
    fi
    echo "Pushing commits $start to $end"

    # Get the commit hashes for the current chunk in chronological order
    chunk_commits=$(git rev-list --reverse $branch | tail -n +$start | head -n $(($end-$i)))

    # Push the current chunk of commits
    git push $NEW_REPO_URL +$(echo "$chunk_commits" | tail -n 1):refs/heads/$branch
  done
done

# Change out of the cloned repository directory
cd ..

# Remove the cloned repository
rm -rf cloned-repo

echo "Done mirroring repository"

Script to migrate a large repo to LFS

Migrates an existing large repository to LFS.

In one push per branch

#!/bin/bash
set -o errexit

# Variables
MIGRATE_REPO_URL="[email protected]:pdellaert/fbw-aircraft-git-lfs.git"
CUSTOM_LFS_SERVER="https://git-lfs.fbw.straks.dev/aircraft.git"
EXTENSIONS=("*.bin" "*.bnk" "*.dds" "*.DDS" "*.gltf" "*.otf" "*.PCK" "*.PNG" "*.psd" "*.TIF" "*.wav")
CHUNK_SIZE=200
MAIN_BRANCH="master"

# 1. Clone the original repository (regular clone)
git clone $MIGRATE_REPO_URL migrate-repo  # Clone the original repository into a directory called "migrate-repo"
cd migrate-repo                           # Change directory into the cloned repository

# 2. Making sure to capture all the branches
for branch in $(git branch -r | grep -v HEAD | grep -v $MAIN_BRANCH); do  # Loop through each remote branch, excluding HEAD
  echo "${branch}"
  git checkout --track $branch              # Check out the remote branch locally and track it
  git pull
done
git checkout ${MAIN_BRANCH}
git pull

# 2. Set up Git LFS
git lfs install                            # Initialize Git LFS in the repository
git config lfs.url $CUSTOM_LFS_SERVER      # Set the custom LFS server URL
#git config lfs.$CUSTOM_LFS_SERVER.locksverify false

git lfs migrate import --everything --include="$(IFS=,; echo "${EXTENSIONS[*]}")" # Migrate files to LFS for the branch

# 3. Migrate files to LFS for all branches and push in chunks
for branch in $MAIN_BRANCH $(git branch -r | grep -v HEAD | grep -v $MAIN_BRANCH); do  # Loop through each remote branch, excluding HEAD
  echo "${branch}"
  git checkout ${branch#origin/}              # Check out the remote branch locally and track it
  git push --force origin ${branch#origin/}
done

In chunks in case the repo is still too large

#!/bin/bash
set -o errexit

# Variables
MIGRATE_REPO_URL="[email protected]:pdellaert/fbw-aircraft-git-lfs.git"
CUSTOM_LFS_SERVER="https://git-lfs.fbw.straks.dev/aircraft.git"
EXTENSIONS=("*.bin" "*.bnk" "*.dds" "*.DDS" "*.gltf" "*.otf" "*.PCK" "*.PNG" "*.psd" "*.TIF" "*.wav")
CHUNK_SIZE=200
MAIN_BRANCH="master"

# 1. Clone the original repository (regular clone)
git clone $MIGRATE_REPO_URL migrate-repo  # Clone the original repository into a directory called "migrate-repo"
cd migrate-repo                           # Change directory into the cloned repository

# 2. Making sure to capture all the branches
for branch in $(git branch -r | grep -v HEAD | grep -v $MAIN_BRANCH); do  # Loop through each remote branch, excluding HEAD
  echo "${branch}"
  git checkout --track $branch              # Check out the remote branch locally and track it
  git pull
done
git checkout ${MAIN_BRANCH}
git pull

# 2. Set up Git LFS
git lfs install                            # Initialize Git LFS in the repository
git config lfs.url $CUSTOM_LFS_SERVER      # Set the custom LFS server URL
git config lfs.$CUSTOM_LFS_SERVER.locksverify false

git lfs migrate import --everything --include="$(IFS=,; echo "${EXTENSIONS[*]}")" # Migrate files to LFS for the branch

# 3. Migrate files to LFS for all branches and push in chunks
for branch in $MAIN_BRANCH $(git branch -r | grep -v HEAD | grep -v $MAIN_BRANCH); do  # Loop through each remote branch, excluding HEAD
  echo "${branch}"
  git checkout ${branch#origin/}              # Check out the remote branch locally and track it

  # Determine chunks
  TOTAL_COMMITS=$(git rev-list --count HEAD) # Count total number of commits in the branch
  CHUNKS=$((TOTAL_COMMITS / CHUNK_SIZE + 1))        # Calculate the number of chunks

  # Push in chunks
  for i in $(seq $((CHUNKS - 1)) -1 0); do      # Loop through each chunk
    COMMIT_HASH=$(git rev-list --reverse --max-count=$CHUNK_SIZE --skip=$((i * CHUNK_SIZE)) HEAD | tail -n 1) # Get the commit hash for the chunk
    git push --force origin $COMMIT_HASH:${branch#origin/} # Push the chunk to the new repository, removing the "origin/" prefix from the branch name
  done
done