Merge pull request #1983 from qodo-ai/tr/benchmark

Tr/benchmark
2025-12-11 18:35:18 +00:00 · 2025-08-08 08:40:35 +03:00 · 2025-08-08 08:40:35 +03:00 · 8e36f46dae
commit 8e36f46dae
parent 5162d847b3 de5c1adaa0
9 changed files with 34 additions and 17 deletions
--- a/README.md
+++ b/README.md
@ -208,7 +208,7 @@ ___
 ## Try It Now
-Try the Claude Sonnet powered PR-Agent instantly on _your public GitHub repository_. Just mention `@CodiumAI-Agent` and add the desired command in any PR comment. The agent will generate a response based on your command.
+Try the GPT-5 powered PR-Agent instantly on _your public GitHub repository_. Just mention `@CodiumAI-Agent` and add the desired command in any PR comment. The agent will generate a response based on your command.
 For example, add a comment to any pull request with the following text:
 ```
--- a/docs/docs/chrome-extension/index.md
+++ b/docs/docs/chrome-extension/index.md
@ -2,7 +2,7 @@
 With a single-click installation you will gain access to a context-aware chat on your pull requests code, a toolbar extension with multiple AI feedbacks, Qodo Merge filters, and additional abilities.
-The extension is powered by top code models like Claude 3.7 Sonnet and o4-mini. All the extension's features are free to use on public repositories.
+The extension is powered by top code models like GPT-5. All the extension's features are free to use on public repositories.
 For private repositories, you will need to install [Qodo Merge](https://github.com/apps/qodo-merge-pro){:target="_blank"} in addition to the extension.
 For a demonstration of how to install Qodo Merge and use it with the Chrome extension, please refer to the tutorial video at the provided [link](https://codium.ai/images/pr_agent/private_repos.mp4){:target="_blank"}.
--- a/docs/docs/faq/index.md
+++ b/docs/docs/faq/index.md
@ -26,7 +26,7 @@ ___
    #### Answer:<span style="display:none;">2</span>
-    - Modern AI models, like Claude Sonnet and GPT-4, are improving rapidly but remain imperfect. Users should critically evaluate all suggestions rather than accepting them automatically.
+    - Modern AI models, like Claude Sonnet and GPT-5, are improving rapidly but remain imperfect. Users should critically evaluate all suggestions rather than accepting them automatically.
    - AI errors are rare, but possible. A main value from reviewing the code suggestions lies in their high probability of catching **mistakes or bugs made by the PR author**. We believe it's worth spending 30-60 seconds reviewing suggestions, even if some aren't relevant, as this practice can enhance code quality and prevent bugs in production.
--- a/docs/docs/overview/pr_agent_pro.md
+++ b/docs/docs/overview/pr_agent_pro.md
@ -24,7 +24,7 @@ Here are some of the additional features and capabilities that Qodo Merge offers
 | Feature                                                                                                              | Description                                                                                                                                            |
 | -------------------------------------------------------------------------------------------------------------------- |--------------------------------------------------------------------------------------------------------------------------------------------------------|
-| [**Model selection**](https://qodo-merge-docs.qodo.ai/usage-guide/PR_agent_pro_models/)                              | Choose the model that best fits your needs, among top models like `Claude Sonnet`, `o4-mini`                                                           |
+| [**Model selection**](https://qodo-merge-docs.qodo.ai/usage-guide/PR_agent_pro_models/)                              | Choose the model that best fits your needs                                                         |
 | [**Global and wiki configuration**](https://qodo-merge-docs.qodo.ai/usage-guide/configuration_options/)              | Control configurations for many repositories from a single location; <br>Edit configuration of a single repo without committing code                   |
 | [**Apply suggestions**](https://qodo-merge-docs.qodo.ai/tools/improve/#overview)                                     | Generate committable code from the relevant suggestions interactively by clicking on a checkbox                                                        |
 | [**Suggestions impact**](https://qodo-merge-docs.qodo.ai/tools/improve/#assessing-impact)                            | Automatically mark suggestions that were implemented by the user (either directly in GitHub, or indirectly in the IDE) to enable tracking of the impact of the suggestions |
--- a/docs/docs/pr_benchmark/index.md
+++ b/docs/docs/pr_benchmark/index.md
@ -34,6 +34,24 @@ A list of the models used for generating the baseline suggestions, and example r
    </tr>
  </thead>
  <tbody>
    <tr>
      <td style="text-align:left;">GPT-5</td>
      <td style="text-align:left;">2025-08-07</td>
      <td style="text-align:left;">medium</td>
      <td style="text-align:center;"><b>72.2</b></td>
    </tr>
    <tr>
      <td style="text-align:left;">GPT-5</td>
      <td style="text-align:left;">2025-08-07</td>
      <td style="text-align:left;">low</td>
      <td style="text-align:center;"><b>67.8</b></td>
    </tr>
    <tr>
      <td style="text-align:left;">GPT-5</td>
      <td style="text-align:left;">2025-08-07</td>
      <td style="text-align:left;">minimal</td>
      <td style="text-align:center;"><b>62.7</b></td>
    </tr>
    <tr>
      <td style="text-align:left;">o3</td>
      <td style="text-align:left;">2025-04-16</td>
--- a/docs/docs/usage-guide/changing_a_model.md
+++ b/docs/docs/usage-guide/changing_a_model.md
@ -107,7 +107,7 @@ Please note that the `custom_model_max_tokens` setting should be configured in a
 !!! note "Local models vs commercial models"
    Qodo Merge is compatible with almost any AI model, but analyzing complex code repositories and pull requests requires a model specifically optimized for code analysis.
-    Commercial models such as GPT-4, Claude Sonnet, and Gemini have demonstrated robust capabilities in generating structured output for code analysis tasks with large input. In contrast, most open-source models currently available (as of January 2025) face challenges with these complex tasks.
+    Commercial models such as GPT-5, Claude Sonnet, and Gemini have demonstrated robust capabilities in generating structured output for code analysis tasks with large input. In contrast, most open-source models currently available (as of January 2025) face challenges with these complex tasks.
    Based on our testing, local open-source models are suitable for experimentation and learning purposes (mainly for the `ask` command), but they are not suitable for production-level code analysis tasks.
--- a/docs/docs/usage-guide/qodo_merge_models.md
+++ b/docs/docs/usage-guide/qodo_merge_models.md
@ -1,5 +1,5 @@
-The default models used by Qodo Merge (June 2025) are a combination of Claude Sonnet 4 and Gemini 2.5 Pro.
+The default models used by Qodo Merge (June 2025) are a combination of GPT-5 and Gemini 2.5 Pro.
 ### Selecting a Specific Model
@ -19,11 +19,11 @@ To restrict Qodo Merge to using only `o4-mini`, add this setting:
 model="o4-mini"
 ```
-To restrict Qodo Merge to using only `GPT-4.1`, add this setting:
+To restrict Qodo Merge to using only `GPT-5`, add this setting:
 ```toml
 [config]
-model="gpt-4.1"
+model="gpt-5"
 ```
 To restrict Qodo Merge to using only `gemini-2.5-pro`, add this setting:
@ -33,10 +33,9 @@ To restrict Qodo Merge to using only `gemini-2.5-pro`, add this setting:
 model="gemini-2.5-pro"
 ```
-
+To restrict Qodo Merge to using only `claude-4-sonnet`, add this setting:
 To restrict Qodo Merge to using only `deepseek-r1` us-hosted, add this setting:
 ```toml
 [config]
-model="deepseek/r1"
+model="claude-4-sonnet"
 ```
--- a/pr_agent/algo/ai_handlers/litellm_ai_handler.py
+++ b/pr_agent/algo/ai_handlers/litellm_ai_handler.py
@ -325,16 +325,16 @@ class LiteLLMAIHandler(BaseAiHandler):
                    "api_base": self.api_base,
                }
            if thinking_kwargs_gpt5:
                kwargs.update(thinking_kwargs_gpt5)
                if 'temperature' in kwargs:
                    del kwargs['temperature']
            # Add temperature only if model supports it
            if model not in self.no_support_temperature_models and not get_settings().config.custom_reasoning_model:
                # get_logger().info(f"Adding temperature with value {temperature} to model {model}.")
                kwargs["temperature"] = temperature
            if thinking_kwargs_gpt5:
                kwargs.update(thinking_kwargs_gpt5)
                if 'temperature' in kwargs:
                    del kwargs['temperature']
            # Add reasoning_effort if model supports it
            if (model in self.support_reasoning_models):
                supported_reasoning_efforts = [ReasoningEffort.HIGH.value, ReasoningEffort.MEDIUM.value, ReasoningEffort.LOW.value]
--- a/requirements.txt
+++ b/requirements.txt
@ -1,4 +1,4 @@
-aiohttp==3.9.5
+aiohttp==3.10.2
 anthropic>=0.52.0
 #anthropic[vertex]==0.47.1
 atlassian-python-api==3.41.4