Cloudflare Git LFS Worker

Cloudflare Workers

A Cloudflare Worker is a serverless application and is executed in a CDN style approach: Closest to the system executing the request. It is a application that gets executed in a vacuum whenever a request is received. It handles the request and when finished returning a response, it is killed.

This means that there is limited capabilities for data consistency between two different requests. There are several solutions to achieve persistency:

Databases
Key/Value stores (Cloudflare has its own)
Durable Objects (Cloudflare concept)

Ideally though, a Worker should be as light as possible and only use persistency concepts if there really is a need.

Requirements

A Worker that can serve as a Git LFS endpoint/service, should have the following capabilities/requirements. These are roughly in order of priority.

Implement the Git LFS Batch API, with the following actions:
- upload
- download
- verify
Implement token-based authentication, with a minimum of two unique entities:
- read-only
- read-write
Implement the basic transfer adapter
Use an R2 bucket for storage
Support any type of file and file size
Return proper response codes as per Git LFS API design
Limit the amount of time a link to upload, download or verify is valid
- Must be configurable
(Optional) Implement Locking API
- If not implemented, it should return 404

Git LFS API

Below is some relevant details about the Git LFS API that needs to be implemented and how this is impacted by the Worker that will be implemented.

For full details, consult the Git LFS API documentation

Overview

In general the Git LFS API is designed in such a way that it doesn't handle the files itself. It has a Batch API component, which receives a batch of requests for operations/actions on a file, for which it returns a URL for the operation/action to be executed. It also has a Locking API, which will put a lock on a file so no other process can manipulate that file.

By separating the Batch API from the actual file operations itself, it is possible to have the LFS API server itself be an entity that just generates URLs which point to a secondary service (for instance R2 API) and the LFS API server doesn't have to handle the file manipulation.

All Batch API requests are executed as HTTP POST requests. And the Batch API is always JSON, with a special Content-Type: application/vnd.git-lfs+json

Batch API

The Batch API is used by clients to indicate the client wants to execute certain operations on files in the Git LFS storage. The only supported operations are download and upload. The Batch API itself doesn't perform these operations itself. Instead it returns a set of actions (urls) to use to execute the requested operations.

The Batch API receives requests in a format that has the following components:

operation - Type of operation the client wants to take on the objects (download or upload)
transfers - Client side supported transfer type, typically just [ "basic" ]
ref
- name - A git reference to which the objects belong to (example: refs/heads/main)
objects - List of object the client wants to perform the operation on
- oid - Unique object ID, which is a unique hash through the hash_algo
- size - Size of the object
hash_algo - hashing algorithm to determining the oid, default is sha256

Note on transfers - Officially only basic is supported, but there are some experimental ones.

The response to a Batch API request contains the following components:

transfer - Server side preferred transfer type (typically basic)
objects - List of the objects to execute the operation on, should be the same set of objects as per the request
- oid - Unique object ID, same as in the request
- size - Object size in bytes, same as in the request
- authenticated - Indicates if the action to take are authenticated
- actions - Object with the actions to take to meet the requested operation. Each entry in this object references either download, upload or verify
  - href - URL to use for the action
  - header - Any special headers that need to be provided with the request, for instance an authentication header
  - expires_in - Indicates for how long the href and headers will be valid (in seconds)
  - expires_at - Exact time the href and headers expire
hash_algo - Similar to the request, defaults to sha256

Operations vs Actions

Operations that are requested can be download or upload, while the actions returned can be download, upload and verify.

For the download operation, the only action returned has to be download. This will tell the LFS client to do a HTTP GET request for the href provided.

For the upload operation, the minimum action returned has to be upload. Which will result in a HTTP PUT request to the href. Optionally, a verify action can also be returned in addition to the upload action. The verify action will result in a HTTP POST request to the href.

The verify option can be used to verify the upload action has succeeded by verifying the oid and size are as what the client expects them to be.

Errors

Several errors can occur, the full list of response codes can be checked in the Git LFS API documentation. In general, an error will always return a JSON object with the following properties:

message - Human readable error message
documentation_url - Optional URL for more information
request_id - Optional request identifier from the client

Locking API

The File Locking API is used to create, list, and delete locks, as well as verify that locks are respected in Git pushes.

Overall this is used to prevent multiple pushes modifying the same file at the same time.

This API requires persistency of state across requests and will initially not be implemented.

Git LFS Worker Design

High Level

To meet the requirements the following high level items will be implemented:

Authentication using 3 unique configurable tokens:
read
write
verify
Batch API will be implemented to support
- download
- upload
- verify
Downloads and Uploads will directly go to the R2 bucket, file upload/downloads will never pass through the Worker (Cloudflare has a 100MB limit per request for workers)
The Batch API will use the R2 API to generate unique secure URLs for downloads and uploads which are only valid for a configurable duration
The Batch API will generate verify URLs with a unique token that is only valid for a configurable duration
The Worker will manage verify API calls itself and handle them as follows:
- Check the token is valid (both authentication wise as well as expiration)
- Verify the size in the request matches with the size as seen in the R2 bucket
Return a 404 error for the Locks API, as per Locks API specifications if not implemented
Return proper response codes for all situation as highlighted by the Git LFS API documentation

Configurable Settings

The following settings can be configured through environment variables for the Worker:

DOWNLOAD_KEY - A unique key/token that must be provided by the client to be able to request the download operation of the Batch API
UPLOAD_KEY - A unique key/token that must be provided by the client to be able to request the upload operation of the Batch API
VERIFY_KEY - A unique key/token that is used to generate a token with an expiration time based on the VALIDITY_DURATION, which the client needs to use for every verify action
VALIDITY_DURATION - A value in seconds to identify how long a verify token must be valid and similarly for how long a download or upload URL for the R2 bucket is valid
R2_BUCKET_URL - The URL of the R2 Bucket
R2_BUCKET_NAME - The name of the R2 Bucket
R2_BUCKET_KEY_ID - The unique key ID for the key that can manage the R2 Bucket
R2_BUCKET_KEY_SECRET - The unique key secret for they key that can manage the R2 Bucket
bucket_name - Needed for the Worker system to know which bucket to allow access to

Security

The solution will allow uploading files into an R2 Bucket, which means it needs to be protected from external uploads that are not meant for Git LFS. This means security needs to be implemented at multiple levels:

Batch API - Authenticated and Authorize requests for operations, both download and upload
Verify API - While not a dangerous API, it is best to secure all access
R2 Bucket access - Generate URLs that are valid for a short period, to make sure the URLs can't be reused or abused

Security Configuration

Security can be configured through the following settings:

DOWNLOAD_KEY
UPLOAD_KEY
VERIFY_KEY
VALIDITY_DURATION

In the initial implementation, the keys will be used by all users. For future enhancements, it might be possible to have a unique token per user.

It is advised that the keys are 128 characters or more and consist of alphanumerical values

Bucket API Authentication

Every request to the Bucket API must have a Authorization header. This Authorization must match the Basic <base64 encoded username:token> format. A username has been added for potential future enhancements where each user has its own unique token.

As an example, if the user john wants to make a request to download files, and the configured DOWNLOAD_KEY equals Sae9phua8ieghahK9aeH, the Authorization header would look like:

Authorization: Basic am9objpTYWU5cGh1YThpZWdoYWhLOWFlSA==

If the request uses the UPLOAD_KEY, it can request both the download and upload operations. If the request uses the DOWNLOAD_KEY, only the download operation will be allowed.

R2 URL Security

The R2 Bucket API of Cloudflare is the same API as AWS uses for their S3 buckets, so the same tools can be used, as per their documentation.

Using aws4fetch, the Bucket API handler will use the AWSClient sign method to have the R2 Bucket API securely sign a R2 Bucket HTTP GET (download operation) or HTTP PUT (upload operation). This request will also have a max validity period, equal to VALIDITY_DURATION.

This secures the downloading and uploading files directly from/to the R2 bucket and prevents unauthenticated access to the data.

Verify API Authentication

The verify API is the least significant or security sensitive API. When the Bucket API needs to generate a verify href as part of the upload operation, it will encode the current date/time (unix timestamp format), using the VERIFY_KEY. It will add that as a special X-FBW-GITLFS-TOKEN header to the verify action in the response.

When the verify action is executed by the client, the Worker will decode the X-FBW-GITLFS-TOKEN header and check the current time against the time in the request. If the difference is larger than the VALIDITY_DURATION, it will reject the verify request.

Functional Design

This will provide a relatively high level view of the main functionalities of the Git LFS Worker, covering the 4 major components:

Initial Request handling
Batch API handling
Verify API handling
Lock API handling

Initial Request Handling

When a request is made and the Worker is executed the following initial actions are taken:

Verify the request method is supported (supported methods: OPTIONS, POST, PUT, GET)
Split up the URL Path into a repository and action, example aircraft.git and objects/batch which would indicate a Batch API request for the aircraft.git repository

If (1) fails, a 405 HTTP Error is returned.

If the OPTIONS HTTP method is used, a response is send back with the allow header containing the allowed methods.

Batch API Handling

If the HTTP method is POST and the action is objects/batch, the Batch API handling is executed.

Verify Authentication

The Authorization header is retrieved and the handler determines if the user has read-only (download) or read/write (upload) access.

If the user has no access at all, a HTTP 401 (Unauthorized) error is returned.

Verifying Request

Several headers in the request are verified according to the Git LFS API specifications:

Accept header must be application/vnd.git-lfs+json, else a HTTP 406 (Accept Error) is returned.
Content-Type header must be application/vnd.git-lfs+json, else a HTTP 422 (Content-Type Error) is returned.
Host header must be set, else a HTTP 422 (Host Error) is returned.

If no body is found in the request, a HTTP 422 (Validation Error) is returned.

To facilitate the handling of the Batch API, the Batch API request and Response objects are verified against the proper schemas. This is done using the ts-interface-checker library. It is used to build a proper schema from an interface definition based on the Batch request and response schemas.

If the request does not match the schema, a HTTP 422 error is returned, including details of what part of the schema check failed.

Building the Response

Each object in the objects property of the request is individually handled, and based on the operation property of the request, a different response is prepared for the object.

The handler will build a list of objects and their actions to be included in the final Batch API Response.

Every time an R2 URL is signed, the composition of the URL is:

URL is set to R2_BUCKET_URL
pathname is set to <R2_BUCKET_NAME>/<repoUser>/<repository>/<oid>, where repoUser is currently always _GLOBAL_

`download`

If the operation is download, the handler will use the AWSClient to sign a request for a HTTP GET method with the URL pointing to the object oid and proper location, and an expiration within VALIDITY_DURATION time.

`upload`

If the operation is upload, the handler will use the AWSClient to sign a request for a HTTP PUT method with the URL pointing to the object oid and proper location, and an expiration within VALIDITY_DURATION time.

The handler will also generate a URL for the Verify API for this object, including a proper X-FBW-GITLFS-TOKEN header.

In case the user has no write access to the object, a 403 error will be raised for that specific object, stating "No write access to object".

Final Response

After generating the correct actions for each object, a Batch API Response object is build and returned to the client.

Verify API Handling

When a POST HTTP request is received with a proper verify path, the handler will verify that the X-FBW-GITLFS-TOKEN is valid (expiration and authentication). If not, a HTTP 422 Error (Unauthorized) is returned.

Similarly, if no body in the request is found, a HTTP 422 Error (Validation Error) is returned.

For a valid request, the handler will execute a head request to the R2 bucket for the object and compare the returned size with the size provided by request. If these match, a HTTP 200 response is returned. If these do not matc, a HTTP 500 Error (Verify Error) is returned.

Lock API Handling

Currently the Lock API is not supported, so any request for the Lock API is responded to with a HTTP 404 Error (Not found).

Fallback Response

In case a request is not handled by any of the previous handlers, a HTTP 501 Error (Method not implemented) is returned.

Setup and Deployment

R2 Bucket Configuration

A Cloudflare R2 Bucket is needed and should be reachable from a controlled URL. This does not need to be a custom domain.

An API token needs to be created with Object Read & Write access to the R2 Bucket.

To configure the Worker correctly, set the following configuration options correctly:

R2_BUCKET_URL - Set this to the S3 API URL, without the name of the bucket
R2_BUCKET_NAME - Set this to the name of the R2 Bucket
R2_BUCKET_KEY_ID - Set this to the API token ID
R2_BUCKET_KEY_SECRET - Set this to the API token Secret

For the R2_BUCKET_URL, this can initially cause some confusion. As an example: If the S3 API URL for the R2 Bucket is https://8f6b3614f05e66b371351cd51293e44b.r2.cloudflarestorage.com/fbw-git-lfs-bucket, then the R2_BUCKET_URL must be set to https://8f6b3614f05e66b371351cd51293e44b.r2.cloudflarestorage.com (do not include fbw-git-lfs-bucket).

Worker Configuration

A Cloudflare Worker can either be created manually, before deploying the code, using the Cloudflare UI (or API). Or Wrangler can be used to deploy the Worker directly from the code.

What is important to verify is:

Correct R2 Bucket bindings to the correct R2 Bucket
Configure a custom domain in case you want the Worker to be accessible over an easy to remember URL
Check the environment variables, and make sure to encrypt all the sensitive data (Anything with KEY in the name)

Deployment

Make sure to install Wrangler:
```
npm install -g wrangler
```
Update the wrangler.toml file with the appropriate configuration.
Use Wrangler to deploy the LFS server
```
wrangler deploy
```

Troubleshooting

You can run a local server running your code using wrangler dev.

Or you can tail the logs of the deployed Worker using wrangler tail.