docs: add Gemini-2.5-flash-preview benchmark comparisons to PR benchmark table

2025-12-12 02:45:18 +00:00 · 2025-05-14 07:35:09 +03:00 · 2025-05-14 07:35:09 +03:00 · 72bcb0ec4c
commit 72bcb0ec4c
parent 3ddd53d4fe
1 changed files with 13 additions and 0 deletions
--- a/docs/docs/pr_benchmark/index.md
+++ b/docs/docs/pr_benchmark/index.md
@ -47,6 +47,18 @@ Here's a summary of the win rates based on the benchmark:
      <td style="text-align:left;">Gemini-2.5-pro-preview-05-06</td>
      <td style="text-align:left;">Sonnet 3.7</td>
      <td style="text-align:center; color: #1E8449;"><b>78.1%</b></td> <td style="text-align:center; color: #D8000C;"><b>21.9%</b></td> </tr>
    <tr>
      <td style="text-align:left;">Gemini-2.5-pro-preview-05-06</td>
      <td style="text-align:left;">Gemini-2.5-flash-preview-04-17</td>
      <td style="text-align:center; color: #1E8449;"><b>73.0%</b></td> <td style="text-align:center; color: #D8000C;"><b>27.0%</b></td> </tr>
    <tr>
      <td style="text-align:left;">Gemini-2.5-flash-preview-04-17</td>
      <td style="text-align:left;">GPT-4.1</td>
      <td style="text-align:center; color: #1E8449;"><b>54.6%</b></td> <td style="text-align:center; color: #D8000C;"><b>45.4%</b></td> </tr>
    <tr>
      <td style="text-align:left;">Gemini-2.5-flash-preview-04-17</td>
      <td style="text-align:left;">Sonnet 3.7</td>
      <td style="text-align:center; color: #1E8449;"><b>60.6%</b></td> <td style="text-align:center; color: #D8000C;"><b>39.4%</b></td> </tr>
    <tr>
      <td style="text-align:left;">GPT-4.1</td>
      <td style="text-align:left;">Sonnet 3.7</td>
@ -54,6 +66,7 @@ Here's a summary of the win rates based on the benchmark:
  </tbody>
 </table>
 ## Gemini-2.5-pro-preview-05-06 - Model Card
 ### Comparison against GPT-4.1