Scaling Sourcegraph on Kubernetes
Sourcegraph can scale to accommodate large codebases and many users.
Increase resources according to the Scaling Overview per Service if you notice slower search or navigation.
Cluster resource guidelines
For production environments, we recommend allocate resources based on your instance size. See our resource estimator for estimates.
Improving performance with a large number of repositories
Here is a simplified list of the key parameters to tune when scaling Sourcegraph to many repositories:
sourcegraph-frontendCPU/memory resource allocationssearcherreplica countindexedSearchreplica count and CPU/memory resource allocationsgitserverreplica countsymbolsreplica count and CPU/memory resource allocationsgitMaxConcurrentClones, becausegit cloneandgit fetchoperations are IO and CPU-intensiverepoListUpdateInterval(in minutes), because each interval triggersgit fetchoperations for all repositories
Notes:
- If your change requires restarting
gitserverpods and they are rescheduled to other nodes, they may go offline briefly (showing aMulti-Attacherror). This is due to volume detach/reattach. Contact us for mitigation steps depending on your cloud provider. - See the docs to understand each service's role:
Improving performance with large monorepos
Here is a simplified list of key parameters to tune when scaling Sourcegraph to large monorepos:
sourcegraph-frontendCPU/memory resource allocationssearcherCPU/memory resource allocations (allocate enough memory to hold all non-binary files in your repositories)indexedSearchCPU/memory resource allocations (for thezoekt-indexserverpod, allocate enough memory to hold all non-binary files in your largest repository; for thezoekt-webserverpod, allocate enough memory to hold ~2.7x the size of all non-binary files in your repositories)symbolsCPU/memory resource allocationsgitserverCPU/memory resource allocations (allocate enough memory to hold your Git packed bare repositories)
Configuring faster disk I/O for caches
Many parts of Sourcegraph's infrastructure benefit from using SSDs for caches. This is especially important for search performance. By default, disk caches will use the Kubernetes hostPath and will be the same IO speed as the underlying node's disk. Even if the node's default disk is a SSD, however, it is likely network-mounted rather than local.