How repo-updater works
Purpose
Sourcegraph mirrors repositories from code hosts. Code hosts may be SaaS products, such as GitHub or AWS CodeCommit, or local installations that are private to a customer’s environment. The repo-updater
service schedules repository synchronization activities using gitserver and any configured code hosts.
Overview
A repo-updater
instance exposes an HTTP server as its primary interface. This interface allows clients to schedule synchronization requests for the following:
- Code host
- Repository
- Repository permission
Although the majority of Git operations are issued directly to gitserver
, clones and fetches are routed through repo-updater
to ensure that code host limits and other concerns are respected.
As noted earlier, there are a variety of code hosts that Sourcegraph can integrate with. The Source interface abstracts these code host communication details. For example, listing GitHub repositories is handled differently than listing GitLab repositories.
The service’s key data structure is a priority queue of repository updates. It implements the heap.Interface
and the sort.Interface
and functions in the following ways:
- Updates are sorted using simple heuristics based on repository metadata
- Queue positions can be modified in response to explicit requests
- Priority levels can be set for permissions and authorization updates
- Updates are handled via background worker jobs
- The external_service_sync_jobs_with_next_sync_at view provides insights into the priority queue’s activities and current depth
Miscellaneous
Production instances
There is exactly one instance of repo-updater
running, by design. This allows us to:
- Avoid expensive coordination issues
- Respecting the aforementioned code host limits
General dependencies
Before repo-updater
can begin accepting work, it needs to check that the following services are running and responsive to pings:
- frontend - implemented by the internal API client
- gitserver instances - implemented by the gitserver client
See “How gitserver works: Production instances” for more information.
Cloud-specific dependencies
If repo-updater
is running in sourcegraph.com mode, it will verify that certain code hosts (specifically GitHub and GitLab) are properly configured. This is a requirement for us to be able to automatically add repositories from those code hosts when users browse to them.
Useful metrics
We track a variety of metrics in repo-updater
that you’ll want to familiarize yourself with. For example: