Model Configuration
Learn how to configure Cody via modelConfiguration on a Sourcegraph
Enterprise instance.
modelConfiguration is the recommended way to configure chat and
autocomplete models in Sourcegraph v5.6.0 and later.
The modelConfiguration field in the Site config section allows you to configure Cody to use different LLM models for chat and autocomplete, enabling greater flexibility in selecting the best model for your needs.
Using this configuration, you get an LLM selector in the Cody chat with an Enterprise instance that allows you to select the model you want to use.
The LLM models available for a Sourcegraph Enterprise instance offer a combination of Sourcegraph-provided models and any custom models providers that you explicitly add to your Sourcegraph instance site configuration.
modelConfiguration
The model configuration for Cody is managed through the "modelConfiguration" field in the Site config section. It includes the following fields:
| Field | Description | 
|---|---|
| sourcegraph | Configures access to Sourcegraph-provided models available through Cody Gateway. | 
| providerOverrides | Configures access to models through your keys with most common LLM providers (BYOK) or hosted behind any openaicompatible endpoint | 
| modelOverrides | Extends or modifies the list of models Cody recognizes and their configurations. | 
| selfHostedModels | Adds models to Cody’s recognized models list with default configurations provided by Sourcegraph. Only available for certain models; general models can be configured in modelOverrides. | 
| defaultModels | Specifies the models assigned to each Cody feature (chat, fast chat, autocomplete). | 
Getting started with modelConfiguration
The recommended and easiest way to set up model configuration is using Sourcegraph-provided models through the Cody Gateway.
For a minimal configuration example, see Configure Sourcegraph-supplied models.
Sourcegraph-provided models
Sourcegraph-provided models, accessible through the Cody Gateway, are managed via your site configuration.
For most administrators, relying on these models alone ensures access to high-quality models without needing to manage specific configurations.
The use of these models is controlled through the "modelConfiguration.sourcegraph" field in the site config.
Configure Sourcegraph-provided models
The minimal configuration for Sourcegraph-provided models is:
JSON"cody.enabled": true, "modelConfiguration": { "sourcegraph": {} }
The above configuration sets up the following:
- Sourcegraph-provided models are enabled (sourcegraphis not set tonull)
- Requests to LLM providers are routed through the Cody Gateway (no providerOverridesfield specified)
- Sourcegraph-defined default models are used for Cody features (no defaultModelsfield specified)
There are three main settings for configuring Sourcegraph-provided LLM models:
| Field | Description | 
|---|---|
| endpoint | (Optional) The URL for connecting to Cody Gateway. The default is set to the production instance. | 
| accessToken | (Optional) The access token for connecting to Cody Gateway defaults to the current license key. | 
| modelFilters | (Optional) Filters specifying which models to include from Cody Gateway. | 
Disable Sourcegraph-provided models
To disable all Sourcegraph-provided models and use only the models explicitly defined in your site configuration, set the "sourcegraph" field to null as shown in the example below.
JSON"cody.enabled": true, "modelConfiguration": { "sourcegraph": null, // ignore Sourcegraph-provided models "providerOverrides": { // define access to the LLM providers }, "modelOverrides": { // define models available via providers defined in the providerOverrides }, "defaultModels": { // set default models per Cody feature from the list of models defined in modelOverrides } }
Default models
The "modelConfiguration" setting includes a "defaultModels" field, which allows you to specify the LLM model used for each Cody feature ("chat", "fastChat", and "codeCompletion"). The values for each feature should be modelRefs of either Sourcegraph-provided models or models configured in the modelOverrides section.
If no default is specified or the specified model is not found, the configuration will silently fall back to a suitable alternative.
Model Filters
The "modelFilters" section allows you to control which Cody Gateway models are available to users of your Sourcegraph Enterprise instance. The following table describes each field:
| Field | Description | 
|---|---|
| statusFilter | Filters models based on their release status, such as stable, beta, deprecated or experimental. By default, all models available on Cody Gateway are accessible. | 
| allow | An array of modelRefs specifying which models to include. Supports wildcards. | 
| deny | An array of modelRefs specifying which models to exclude. Supports wildcards. | 
The following examples demonstrate how to use each of these settings together:
JSON"cody.enabled": true, "modelConfiguration": { "sourcegraph": { "modelFilters": { // Only allow "beta" and "stable" models. // Not "experimental" or "deprecated". "statusFilter": ["beta", "stable"], // Allow any models provided by Anthropic, OpenAI, Google and Fireworks. "allow": [ "anthropic::*", // Anthropic models "openai::*", // OpenAI models "google::*", // Google Gemini models "fireworks::*", // Open-source models hosted by Sourcegraph on Fireworks.ai (typically used for autocomplete and Mistral Chat models) ], // Example: Do not include any models with the Model ID containing "turbo", // or any models from a hypothetical provider "AcmeCo" "deny": [ "*turbo*", "acmeco::*" ] } } }
Provider Overrides
A provider is an organizational concept for grouping LLM models. Typically, a provider refers to the company that produced the model or the specific API/service used to access it, serving as a namespace.
By defining a provider override in your Sourcegraph site configuration, you can introduce a new namespace to organize models or customize the existing provider namespace supplied by Sourcegraph (e.g., for all "anthropic" models).
Provider overrides are configured via the "modelConfiguration.providerOverrides" field in the site configuration.
This field is an array of items, each containing the following fields:
| Field | Description | 
|---|---|
| id | The namespace for models accessed via the provider. | 
| displayName | A human-readable name for the provider. | 
| serverSideConfig | Defines how to access the provider. See the section below for additional details. | 
Example configuration:
JSON"cody.enabled": true, "modelConfiguration": { "sourcegraph": {}, "providerOverrides": [ { "id": "openai", "displayName": "OpenAI (via BYOK)", "serverSideConfig": { "type": "openai", "accessToken": "token", "endpoint": "https://api.openai.com" } }, ], "defaultModels": { "chat": "google::v1::gemini-1.5-pro", "fastChat": "anthropic::2023-06-01::claude-3-haiku", "codeCompletion": "fireworks::v1::deepseek-coder-v2-lite-base" } }
In the example above:
- Sourcegraph-provided models are enabled (sourcegraphis not set tonull)
- The "openai"provider configuration is overridden to be accessed using your own key, with the provider API accessed directly via the specifiedendpointURL. In contrast, models from the"anthropic"provider are accessed through the Cody Gateway
Refer to the examples page for additional examples.
Server-side config
The most important part of a provider's configuration is the "serverSideConfig" field, which defines how the LLM models should be invoked, i.e., which external service or API will handle the LLM requests.
Sourcegraph natively supports several types of LLM API providers. The current set of supported providers includes:
| Provider type | Description | 
|---|---|
| "sourcegraph" | Cody Gateway, offering access to various models from multiple services | 
| "openaicompatible" | Any OpenAI-compatible API implementation | 
| "awsBedrock" | Amazon Bedrock | 
| "azureOpenAI" | Microsoft Azure OpenAI | 
| "anthropic" | Anthropic | 
| "fireworks" | Fireworks AI | 
| "google" | Google Gemini and Vertex | 
| "openai" | OpenAI | 
| "huggingface-tgi" | Hugging Face Text Generation Interface | 
The configuration for serverSideConfig varies by provider type.
Refer to the examples page for examples.
Model Overrides
With a provider defined (either a Sourcegraph-provided provider or a custom provider configured via the providerOverrides field), custom models can be specified for that provider by adding them to the "modelConfiguration.modelOverrides" section.
This field is an array of items, each with the following fields:
- 
modelRef: Uniquely identifies the model within the provider namespace- A string in the format ${providerId}::${apiVersionId}::${modelId}
- To associate a model with your provider, ${providerId}must match the provider’s ID
- ${modelId}can be any URL-safe string
- ${apiVersionId}specifies the API version, which helps detect compatibility issues between models and Sourcegraph instances. For example,- "2023-06-01"can indicate that the model uses that version of the Anthropic API. If unsure, you may set this to- "unknown"when defining custom models
 
- A string in the format 
- 
displayName: An optional, user-friendly name for the model. If not set, clients should display theModelIDpart of themodelRefinstead (not themodelName)
- 
modelName: A unique identifier the API provider uses to specify which model is being invoked. This is the identifier that the LLM provider recognizes to determine the model you are calling
- 
capabilities: A list of capabilities that the model supports. Supported values:autocomplete,chat,vision,reasoning,edit,tools.
- 
category: Specifies the model's category with the following options:- "balanced": Typically the best default choice for most users. This category is suited for models like Sonnet 3.5 (as of October 2024)
- "speed": Ideal for low-parameter models that may not suit general-purpose chat but are beneficial for specialized tasks, such as query rewriting
- "accuracy": Reserved for models, like OpenAI o1, that use advanced reasoning techniques to improve response accuracy, though with slower latency
- "other": Used for older models without distinct advantages in reasoning or speed. Select this category if you are uncertain about which category to choose
- "deprecated": For models that are no longer supported by the provider and are filtered out on the client side (not available for use)
 
- 
contextWindow: An object that defines the number of tokens (units of text) that can be sent to the LLM. This setting influences response time and request cost and may vary according to the limits set by each LLM model or provider. It includes two fields:- maxInputTokens: Specifies the maximum number of tokens for the contextual data in the prompt (e.g., question, relevant snippets)
- maxOutputTokens: Specifies the maximum number of tokens allowed in the response
 
- 
reasoningEffort: Specifies the effort on reasoning for reasoning models (havingreasoningcapability). Supported values:high,medium,low. How this value is treated depends on the specific provider. For example, for Anthropic models supporting thinking,loweffort means that the minimumthinking.budget_tokensvalue (1024) will be used. For otherreasoningEffortvalues, thecontextWindow.maxOutputTokens / 2value will be used. For OpenAI reasoning models, thereasoningEffortfield value corresponds to thereasoning_effortrequest body value.
- 
serverSideConfig: Additional configuration for the model. It can be one of the following:- awsBedrockProvisionedThroughput: Specifies provisioned throughput settings for AWS Bedrock models with the following fields:- type: Must be- "awsBedrockProvisionedThroughput"
- arn: The ARN (Amazon Resource Name) for provisioned throughput to use when sending requests to AWS Bedrock. The ARN format for provisioned models is:- arn:${Partition}:bedrock:${Region}:${Account}:provisioned-model/${ResourceId}.
 
- openaicompatible: Configuration specific to models provided by an OpenAI-compatible provider, with the following fields:- type: Must be- "openaicompatible"
- apiModel: The literal string value of the- modelfield to be sent to the- /chat/completionsAPI. If set, Sourcegraph treats this as an opaque string and sends it directly to the API without inferring any additional information. By default, the configured model name is sent
 
 
Example configuration
JSON"cody.enabled": true, "modelConfiguration": { "sourcegraph": {}, "providerOverrides": [ { "id": "huggingface-codellama", "displayName": "huggingface", "serverSideConfig": { "type": "openaicompatible", // optional: disable the use of /completions for autocomplete requests, instead using // only /chat/completions. (available in Sourcegraph 6.4+ and 6.3.2692) // // "useLegacyCompletions": false, "endpoints": [ { "url": "https://api-inference.huggingface.co/models/meta-llama/CodeLlama-7b-hf/v1/", "accessToken": "token" // optional: send custom headers (in which case accessToken above is not used) // (available in Sourcegraph 6.4+ and 6.3.2692) // // "headers": { "X-api-key": "foo", "My-Custom-Http-Header": "bar" }, } ] } } ], "modelOverrides": [ { "modelRef": "huggingface-codellama::v1::CodeLlama-7b-hf", "modelName": "CodeLlama-7b-hf", "displayName": "(HuggingFace) CodeLlama-7b-hf", "contextWindow": { "maxInputTokens": 8192, "maxOutputTokens": 4096 }, "serverSideConfig": { "type": "openaicompatible", "apiModel": "meta-llama/CodeLlama-7b-hf" }, "clientSideConfig": { "openaicompatible": {} }, "capabilities": ["autocomplete", "chat"], "category": "balanced", "status": "stable" } ], "defaultModels": { "chat": "google::v1::gemini-1.5-pro", "fastChat": "anthropic::2023-06-01::claude-3-haiku", "codeCompletion": "huggingface-codellama::v1::CodeLlama-7b-hf" } }
In the example above:
- Sourcegraph-provided models are enabled (sourcegraphis not set tonull)
- An additional provider, "huggingface-codellama", is configured to access Hugging Face’s OpenAI-compatible API directly
- A custom model, "CodeLlama-7b-hf", is added using the"huggingface-codellama"provider
- Default models are set up as follows:
- Sourcegraph-provided models are used for "chat"and"fastChat"(accessed via Cody Gateway)
- The newly configured model, "huggingface-codellama::v1::CodeLlama-7b-hf", is used for"codeCompletion"(connecting directly to Hugging Face’s OpenAI-compatible API)
 
- Sourcegraph-provided models are used for 
Example configuration with Claude 3.7 Sonnet
JSON{ "modelRef": "anthropic::2024-10-22::claude-3-7-sonnet-latest", "displayName": "Claude 3.7 Sonnet", "modelName": "claude-3-7-sonnet-latest", "capabilities": [ "chat", "reasoning" ], "category": "accuracy", "status": "stable", "tier": "pro", "contextWindow": { "maxInputTokens": 45000, "maxOutputTokens": 4000 }, "modelCost": { "unit": "mtok", "inputTokenPennies": 300, "outputTokenPennies": 1500 }, "reasoningEffort": "high" },
In this modelOverrides config example:
- The model is configured to use Claude 3.7 Sonnet with Cody Gateway
- The model is configured to use the "chat"and"reasoning"capabilities
- The reasoningEffortcan be set to 3 different options in the Model Config. These options arehigh,mediumandlow
- The default reasoningEffortis set tolow
- For Anthropic models supporting thinking, when the reasoning effort is low, 1024 tokens is used as the thinking budget. Withmediumandhighthe thinking budget is set to half of the maxOutputTokens value
Refer to the examples page for additional examples.
View configuration
To view the current model configuration, run the following command:
BASHexport INSTANCE_URL="https://sourcegraph.test:3443" # Replace with your Sourcegraph instance URL export ACCESS_TOKEN="your access token" curl --location "${INSTANCE_URL}/.api/modelconfig/supported-models.json" \ --header "Authorization: token $ACCESS_TOKEN"
The response includes:
- Configured providers and models—both Sourcegraph-provided (if enabled, with any applied filters) and any overrides
- Default models for Cody features
Example response
JSON"schemaVersion": "1.0", "revision": "0.0.0+dev", "providers": [ { "id": "anthropic", "displayName": "Anthropic" }, { "id": "fireworks", "displayName": "Fireworks" }, { "id": "google", "displayName": "Google" }, { "id": "openai", "displayName": "OpenAI" }, { "id": "mistral", "displayName": "Mistral" } ], "models": [ { "modelRef": "anthropic::2024-10-22::claude-3-5-sonnet-latest", "displayName": "Claude 3.5 Sonnet (Latest)", "modelName": "claude-3-5-sonnet-latest", "capabilities": ["chat"], "category": "accuracy", "status": "stable", "contextWindow": { "maxInputTokens": 45000, "maxOutputTokens": 4000 } }, { "modelRef": "anthropic::2023-06-01::claude-3-haiku", "displayName": "Claude 3 Haiku", "modelName": "claude-3-haiku-20240307", "capabilities": ["chat"], "category": "speed", "status": "stable", "contextWindow": { "maxInputTokens": 7000, "maxOutputTokens": 4000 } }, { "modelRef": "fireworks::v1::starcoder", "displayName": "StarCoder", "modelName": "starcoder", "capabilities": ["autocomplete"], "category": "speed", "status": "stable", "contextWindow": { "maxInputTokens": 2048, "maxOutputTokens": 256 } }, { "modelRef": "fireworks::v1::deepseek-coder-v2-lite-base", "displayName": "DeepSeek V2 Lite Base", "modelName": "accounts/sourcegraph/models/deepseek-coder-v2-lite-base", "capabilities": ["autocomplete"], "category": "speed", "status": "stable", "contextWindow": { "maxInputTokens": 2048, "maxOutputTokens": 256 } }, { "modelRef": "google::v1::gemini-1.5-pro", "displayName": "Gemini 1.5 Pro", "modelName": "gemini-1.5-pro", "capabilities": ["chat"], "category": "balanced", "status": "stable", "contextWindow": { "maxInputTokens": 45000, "maxOutputTokens": 4000 } }, { "modelRef": "openai::2024-02-01::gpt-4o", "displayName": "GPT-4o", "modelName": "gpt-4o", "capabilities": ["chat"], "category": "accuracy", "status": "stable", "contextWindow": { "maxInputTokens": 45000, "maxOutputTokens": 4000 } }, { "modelRef": "openai::2024-02-01::cody-chat-preview-001", "displayName": "OpenAI o1-preview", "modelName": "cody-chat-preview-001", "capabilities": ["chat"], "category": "accuracy", "status": "waitlist", "contextWindow": { "maxInputTokens": 45000, "maxOutputTokens": 4000 } }, ], "defaultModels": { "chat": "anthropic::2024-10-22::claude-3-5-sonnet-latest", "fastChat": "anthropic::2023-06-01::claude-3-haiku", "codeCompletion": "fireworks::v1::deepseek-coder-v2-lite-base" }
Reasoning models
Reasoning models can be added via modelOverrides in the site configuration by adding the reasoning capability to the capabilities list, and setting the reasoningEffort field on the model. Both must be set for the models' reasoning functionality to be used (otherwise the base model without reasoning / exteded thinking will be used.)
For example, this modelOverride would create a Claude Sonnet 4 with Thinking option in the Cody model selector menu, and when the user chats with Cody with that model selected, it would use Claude Sonnet 4's Extended Thinking support with a low reasoning effort for the users' chat:
JSON{ "modelRef": "bedrock::2024-10-22::claude-sonnet-4-thinking-latest", "displayName": "Claude Sonnet 4 with Thinking", "modelName": "claude-sonnet-4-20250514", "contextWindow": { "maxInputTokens": 93000, "maxOutputTokens": 64000, "maxUserInputTokens": 18000 }, "capabilities": [ "chat", "reasoning" ], "reasoningEffort": "low", "category": "accuracy", "status": "stable" }