Merge pull request #1983 from qodo-ai/tr/benchmark

Tr/benchmark
This commit is contained in:
Tal 2025-08-08 08:40:35 +03:00 committed by GitHub
commit 8e36f46dae
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
9 changed files with 34 additions and 17 deletions

View file

@ -208,7 +208,7 @@ ___
## Try It Now ## Try It Now
Try the Claude Sonnet powered PR-Agent instantly on _your public GitHub repository_. Just mention `@CodiumAI-Agent` and add the desired command in any PR comment. The agent will generate a response based on your command. Try the GPT-5 powered PR-Agent instantly on _your public GitHub repository_. Just mention `@CodiumAI-Agent` and add the desired command in any PR comment. The agent will generate a response based on your command.
For example, add a comment to any pull request with the following text: For example, add a comment to any pull request with the following text:
``` ```

View file

@ -2,7 +2,7 @@
With a single-click installation you will gain access to a context-aware chat on your pull requests code, a toolbar extension with multiple AI feedbacks, Qodo Merge filters, and additional abilities. With a single-click installation you will gain access to a context-aware chat on your pull requests code, a toolbar extension with multiple AI feedbacks, Qodo Merge filters, and additional abilities.
The extension is powered by top code models like Claude 3.7 Sonnet and o4-mini. All the extension's features are free to use on public repositories. The extension is powered by top code models like GPT-5. All the extension's features are free to use on public repositories.
For private repositories, you will need to install [Qodo Merge](https://github.com/apps/qodo-merge-pro){:target="_blank"} in addition to the extension. For private repositories, you will need to install [Qodo Merge](https://github.com/apps/qodo-merge-pro){:target="_blank"} in addition to the extension.
For a demonstration of how to install Qodo Merge and use it with the Chrome extension, please refer to the tutorial video at the provided [link](https://codium.ai/images/pr_agent/private_repos.mp4){:target="_blank"}. For a demonstration of how to install Qodo Merge and use it with the Chrome extension, please refer to the tutorial video at the provided [link](https://codium.ai/images/pr_agent/private_repos.mp4){:target="_blank"}.

View file

@ -26,7 +26,7 @@ ___
#### Answer:<span style="display:none;">2</span> #### Answer:<span style="display:none;">2</span>
- Modern AI models, like Claude Sonnet and GPT-4, are improving rapidly but remain imperfect. Users should critically evaluate all suggestions rather than accepting them automatically. - Modern AI models, like Claude Sonnet and GPT-5, are improving rapidly but remain imperfect. Users should critically evaluate all suggestions rather than accepting them automatically.
- AI errors are rare, but possible. A main value from reviewing the code suggestions lies in their high probability of catching **mistakes or bugs made by the PR author**. We believe it's worth spending 30-60 seconds reviewing suggestions, even if some aren't relevant, as this practice can enhance code quality and prevent bugs in production. - AI errors are rare, but possible. A main value from reviewing the code suggestions lies in their high probability of catching **mistakes or bugs made by the PR author**. We believe it's worth spending 30-60 seconds reviewing suggestions, even if some aren't relevant, as this practice can enhance code quality and prevent bugs in production.

View file

@ -24,7 +24,7 @@ Here are some of the additional features and capabilities that Qodo Merge offers
| Feature | Description | | Feature | Description |
| -------------------------------------------------------------------------------------------------------------------- |--------------------------------------------------------------------------------------------------------------------------------------------------------| | -------------------------------------------------------------------------------------------------------------------- |--------------------------------------------------------------------------------------------------------------------------------------------------------|
| [**Model selection**](https://qodo-merge-docs.qodo.ai/usage-guide/PR_agent_pro_models/) | Choose the model that best fits your needs, among top models like `Claude Sonnet`, `o4-mini` | | [**Model selection**](https://qodo-merge-docs.qodo.ai/usage-guide/PR_agent_pro_models/) | Choose the model that best fits your needs |
| [**Global and wiki configuration**](https://qodo-merge-docs.qodo.ai/usage-guide/configuration_options/) | Control configurations for many repositories from a single location; <br>Edit configuration of a single repo without committing code | | [**Global and wiki configuration**](https://qodo-merge-docs.qodo.ai/usage-guide/configuration_options/) | Control configurations for many repositories from a single location; <br>Edit configuration of a single repo without committing code |
| [**Apply suggestions**](https://qodo-merge-docs.qodo.ai/tools/improve/#overview) | Generate committable code from the relevant suggestions interactively by clicking on a checkbox | | [**Apply suggestions**](https://qodo-merge-docs.qodo.ai/tools/improve/#overview) | Generate committable code from the relevant suggestions interactively by clicking on a checkbox |
| [**Suggestions impact**](https://qodo-merge-docs.qodo.ai/tools/improve/#assessing-impact) | Automatically mark suggestions that were implemented by the user (either directly in GitHub, or indirectly in the IDE) to enable tracking of the impact of the suggestions | | [**Suggestions impact**](https://qodo-merge-docs.qodo.ai/tools/improve/#assessing-impact) | Automatically mark suggestions that were implemented by the user (either directly in GitHub, or indirectly in the IDE) to enable tracking of the impact of the suggestions |

View file

@ -34,6 +34,24 @@ A list of the models used for generating the baseline suggestions, and example r
</tr> </tr>
</thead> </thead>
<tbody> <tbody>
<tr>
<td style="text-align:left;">GPT-5</td>
<td style="text-align:left;">2025-08-07</td>
<td style="text-align:left;">medium</td>
<td style="text-align:center;"><b>72.2</b></td>
</tr>
<tr>
<td style="text-align:left;">GPT-5</td>
<td style="text-align:left;">2025-08-07</td>
<td style="text-align:left;">low</td>
<td style="text-align:center;"><b>67.8</b></td>
</tr>
<tr>
<td style="text-align:left;">GPT-5</td>
<td style="text-align:left;">2025-08-07</td>
<td style="text-align:left;">minimal</td>
<td style="text-align:center;"><b>62.7</b></td>
</tr>
<tr> <tr>
<td style="text-align:left;">o3</td> <td style="text-align:left;">o3</td>
<td style="text-align:left;">2025-04-16</td> <td style="text-align:left;">2025-04-16</td>

View file

@ -107,7 +107,7 @@ Please note that the `custom_model_max_tokens` setting should be configured in a
!!! note "Local models vs commercial models" !!! note "Local models vs commercial models"
Qodo Merge is compatible with almost any AI model, but analyzing complex code repositories and pull requests requires a model specifically optimized for code analysis. Qodo Merge is compatible with almost any AI model, but analyzing complex code repositories and pull requests requires a model specifically optimized for code analysis.
Commercial models such as GPT-4, Claude Sonnet, and Gemini have demonstrated robust capabilities in generating structured output for code analysis tasks with large input. In contrast, most open-source models currently available (as of January 2025) face challenges with these complex tasks. Commercial models such as GPT-5, Claude Sonnet, and Gemini have demonstrated robust capabilities in generating structured output for code analysis tasks with large input. In contrast, most open-source models currently available (as of January 2025) face challenges with these complex tasks.
Based on our testing, local open-source models are suitable for experimentation and learning purposes (mainly for the `ask` command), but they are not suitable for production-level code analysis tasks. Based on our testing, local open-source models are suitable for experimentation and learning purposes (mainly for the `ask` command), but they are not suitable for production-level code analysis tasks.

View file

@ -1,5 +1,5 @@
The default models used by Qodo Merge (June 2025) are a combination of Claude Sonnet 4 and Gemini 2.5 Pro. The default models used by Qodo Merge (June 2025) are a combination of GPT-5 and Gemini 2.5 Pro.
### Selecting a Specific Model ### Selecting a Specific Model
@ -19,11 +19,11 @@ To restrict Qodo Merge to using only `o4-mini`, add this setting:
model="o4-mini" model="o4-mini"
``` ```
To restrict Qodo Merge to using only `GPT-4.1`, add this setting: To restrict Qodo Merge to using only `GPT-5`, add this setting:
```toml ```toml
[config] [config]
model="gpt-4.1" model="gpt-5"
``` ```
To restrict Qodo Merge to using only `gemini-2.5-pro`, add this setting: To restrict Qodo Merge to using only `gemini-2.5-pro`, add this setting:
@ -33,10 +33,9 @@ To restrict Qodo Merge to using only `gemini-2.5-pro`, add this setting:
model="gemini-2.5-pro" model="gemini-2.5-pro"
``` ```
To restrict Qodo Merge to using only `claude-4-sonnet`, add this setting:
To restrict Qodo Merge to using only `deepseek-r1` us-hosted, add this setting:
```toml ```toml
[config] [config]
model="deepseek/r1" model="claude-4-sonnet"
``` ```

View file

@ -325,16 +325,16 @@ class LiteLLMAIHandler(BaseAiHandler):
"api_base": self.api_base, "api_base": self.api_base,
} }
if thinking_kwargs_gpt5:
kwargs.update(thinking_kwargs_gpt5)
if 'temperature' in kwargs:
del kwargs['temperature']
# Add temperature only if model supports it # Add temperature only if model supports it
if model not in self.no_support_temperature_models and not get_settings().config.custom_reasoning_model: if model not in self.no_support_temperature_models and not get_settings().config.custom_reasoning_model:
# get_logger().info(f"Adding temperature with value {temperature} to model {model}.") # get_logger().info(f"Adding temperature with value {temperature} to model {model}.")
kwargs["temperature"] = temperature kwargs["temperature"] = temperature
if thinking_kwargs_gpt5:
kwargs.update(thinking_kwargs_gpt5)
if 'temperature' in kwargs:
del kwargs['temperature']
# Add reasoning_effort if model supports it # Add reasoning_effort if model supports it
if (model in self.support_reasoning_models): if (model in self.support_reasoning_models):
supported_reasoning_efforts = [ReasoningEffort.HIGH.value, ReasoningEffort.MEDIUM.value, ReasoningEffort.LOW.value] supported_reasoning_efforts = [ReasoningEffort.HIGH.value, ReasoningEffort.MEDIUM.value, ReasoningEffort.LOW.value]

View file

@ -1,4 +1,4 @@
aiohttp==3.9.5 aiohttp==3.10.2
anthropic>=0.52.0 anthropic>=0.52.0
#anthropic[vertex]==0.47.1 #anthropic[vertex]==0.47.1
atlassian-python-api==3.41.4 atlassian-python-api==3.41.4