Commit graph

444 commits

Author SHA1 Message Date
Timothy Jaeryang Baek
210197fd43 refac/fix: web/youtube file attachment handling 2025-09-13 00:02:48 +04:00
Timothy Jaeryang Baek
2185fc61c0 refac 2025-09-11 21:29:56 +04:00
Timothy Jaeryang Baek
485392fe63 chore: format 2025-09-09 18:19:31 +04:00
Tim Jaeryang Baek
71fd483fba
Merge pull request #17276 from Elettrotecnica/extend-docling-configuration
feat: Extend docling configuration options
2025-09-09 18:04:30 +04:00
Timothy Jaeryang Baek
0214c1e66c refac 2025-09-09 16:48:59 +04:00
Timothy Jaeryang Baek
5f0d262c59 fix: yt embed 2025-09-09 16:00:42 +04:00
Antonio Pisano
daa2a036f8 Extend docling configuration options to include:
* do_ocr
* force_ocr
* pdf_backend
* table_mode
* pipeline

as per https://github.com/docling-project/docling-serve/blob/main/docs/usage.md

See https://github.com/open-webui/open-webui/issues/17148
2025-09-08 18:51:33 +02:00
Timothy Jaeryang Baek
4f2e426fc7 refac 2025-09-01 14:27:20 +04:00
Timothy Jaeryang Baek
609a6a3721 refac 2025-09-01 14:22:02 +04:00
Timothy Jaeryang Baek
85153afda8 refac 2025-09-01 14:21:17 +04:00
Timothy Jaeryang Baek
487979859a fix: web/youtube attachements 2025-09-01 01:22:50 +04:00
Timothy Jaeryang Baek
ac0243e8b7 refac 2025-09-01 00:57:13 +04:00
Tim Jaeryang Baek
719d115d49
Merge pull request #17049 from rgaricano/dev-FIX_lex-sem
FIX: Hybrid Search
2025-09-01 00:00:25 +04:00
Tim Jaeryang Baek
4e7b0ea4b4
Merge pull request #17013 from athoik/fix-17000
fix: handle unicode filenames in external document loader
2025-08-31 23:58:52 +04:00
Timothy Jaeryang Baek
c2b4976c82 enh: PGVECTOR_CREATE_EXTENSION env var 2025-08-31 23:58:18 +04:00
_00_
647e38f701
Revert bypass hybrid search when BM25_weight=0
Revert PR https://github.com/open-webui/open-webui/commit/74b1c801
2025-08-30 10:45:35 +02:00
Athanasios Oikonomou
d735b036fe fix: handle unicode filenames in external document loader
Files with special characters in their names (e.g., ü.pdf) caused issues since HTTP headers only allow Latin-1 characters.
This change URL-encodes `X-Filename` before adding it to request headers, preventing failures when uploading or processing such files.

Fixes: #17000
2025-08-28 22:19:50 +03:00
Timothy Jaeryang Baek
2bb6063dcb refac/fix: marker 2025-08-28 03:03:31 +04:00
Timothy Jaeryang Baek
23a9731899 refac/fix: hybrid search 2025-08-26 15:04:46 +04:00
Tim Jaeryang Baek
4267e22d4a
Merge pull request #16826 from selenecodes/feat/azure-document-intelligence-azure-entra-auth
feat: Authenticate Azure Document Intelligence using DefaultAzureCredential
2025-08-26 14:32:04 +04:00
_00_
093af754e7
FIX: Playwright Timeout (ms) interpreted as seconds
Fix for Playwright Timeout (ms) interpreted as seconds.

To address https://github.com/open-webui/open-webui/issues/16801

In Frontend Playwright Timeout is setted as (ms), but in backend is interpreted as (s) doing a time conversion for playwright_timeout var (that have to be in ms).

& as  _Originally posted by @rawbby in [#16801](https://github.com/open-webui/open-webui/issues/16801#issuecomment-3216782565)_

> I personally think milliseconds are a reasonable choice for the timeout. Maybe the conversion should be fixed, not the label.
> This would further not break existing configurations from users that rely on their current config.
>
2025-08-23 14:15:00 +02:00
Selene Blok
5051bfe7ab feat(document retrieval): Authenticate Azure Document Intelligence using AzureDefaultCredential if API key is not provided 2025-08-22 16:15:43 +02:00
Timothy Jaeryang Baek
fbff4e19de fix: reranking 2025-08-22 16:47:05 +04:00
Timothy Jaeryang Baek
60b8cfb9fa refac 2025-08-21 21:48:21 +04:00
Timothy Jaeryang Baek
02479425a5 refac 2025-08-21 12:51:41 +04:00
Timothy Jaeryang Baek
1a15a62b73 chore: format 2025-08-21 04:47:28 +04:00
Tim Jaeryang Baek
7452b87877
Merge pull request #16741 from 0xThresh/s3vector-support
fix: batch S3 vectors in groups of 500 to comply with API limitations
2025-08-20 13:25:42 +04:00
James W.
45d9a720b9
Merge branch 'open-webui:main' into s3vector-support 2025-08-19 22:06:16 -06:00
0xThresh.eth
7fcc545672 fix: batch S3 vectors in groups of 500 to comply with API limitations 2025-08-19 22:05:47 -06:00
Timothy Jaeryang Baek
f97f21bf3a refac/fix: rename WEB_SEARCH_CONCURRENT_REQUESTS to WEB_LOADER_CONCURRENT_REQUESTS 2025-08-18 20:06:36 +04:00
Tim Jaeryang Baek
0b59aa940e
Merge pull request #16606 from Rain6435/fix/azure-postgresql-pgvector-permissions
fix: resolve Azure PostgreSQL pgvector extension permission issue
2025-08-15 00:59:04 +04:00
Rain6435
a1e62ab422 fix: Formatting 2025-08-14 01:50:57 -04:00
Rain6435
1a42e96a3b fix: resolve Azure PostgreSQL pgvector extension permission issue
Replace direct CREATE EXTENSION commands with conditional checks to avoid
  permission errors on Azure PostgreSQL Flexible Server where only
  azure_pg_admin members can create extensions.

  - Check pg_extension table before attempting to create vector extension
  - Apply same fix to pgcrypto extension for consistency
  - Allows following least privilege principle for database users

  Fixes #12453
2025-08-14 01:45:02 -04:00
Timothy Jaeryang Baek
ad98d4300b refac/fix: milvus query logic 2025-08-14 03:18:38 +04:00
expruc
74b1c80132 disable collection retrieval and bm_25 calculation if bm_25 weight is 0 or less 2025-08-12 15:53:39 +03:00
Timothy Jaeryang Baek
890691319f fix: s3vector import issue 2025-08-11 16:23:08 +04:00
Timothy Jaeryang Baek
21094ca88b fix: pinecone insert issue 2025-08-11 16:22:58 +04:00
Timothy Jaeryang Baek
77189664c2 chore: format 2025-08-09 23:57:35 +04:00
Tim Jaeryang Baek
53425ffadb
Merge pull request #16419 from expruc/feat/qdrant_improvements
feat: qdrant client improvements
2025-08-09 23:52:12 +04:00
expruc
8af9ad3f30 updated query function with scroll too 2025-08-09 22:04:41 +03:00
expruc
88abd01b87 qdrant client improvements 2025-08-09 21:12:30 +03:00
Jan Kessler
3a9601c053
use .rollback() after read-only transaction on pgvector to avoid infinitely idle transactions (and errors in certain scenarios) 2025-08-09 20:09:45 +02:00
Tim Jaeryang Baek
17084f629c
Merge pull request #16385 from gaby/2025-08-08-13-38-31
feat: Propagate upstream OpenAI router errors
2025-08-09 00:58:14 +04:00
Tim Jaeryang Baek
8714df17dd
Merge pull request #16381 from psy42a/patch-1
fix: failure to bind metadata variable on insert for PGVECTOR_PGCRYPTO feature returning syntax error
2025-08-09 00:26:30 +04:00
Juan Calderon-Perez
7619f449c8 Format code base 2025-08-08 10:10:32 -04:00
Juan Calderon-Perez
d2f2d42e09 Format python code 2025-08-08 10:09:31 -04:00
Timothy Jaeryang Baek
8b489cb31f refac: s3 vector 2025-08-08 12:24:47 +04:00
Tim Jaeryang Baek
70eb83b701
Merge pull request #16185 from hiwylee/vector-search-branch
feat: oracle 23ai Vector search for new supported vector db
2025-08-06 14:36:14 +04:00
psy42a
f3b0f7d358
Fix syntax error where the previous use of :metadata::text in some sqlachamy/postgres versions doesn't bind at all
Fix syntax error where the previous use of :metadata::text in some sqlachamy/postgres versions doesn't bind the variable at all
2025-08-05 23:27:50 +10:00
Timothy Jaeryang Baek
e8696c63fe refac 2025-08-04 15:23:43 +04:00
Tim Jaeryang Baek
5db60ca34f
Merge pull request #15903 from Hisma/marker-api-update
feat: Add configurable API URL (for self-hosting) and additional_config parameter for Datalab Marker API
2025-08-04 15:21:03 +04:00
Timothy Jaeryang Baek
7aeca7dee2 refac 2025-08-04 15:12:39 +04:00
hiwylee
bd215a1b96
Merge branch 'dev' into vector-search-branch 2025-08-01 04:23:38 +09:00
hiwylee
0e640dd71e resolve conflict 2025-08-01 02:58:51 +09:00
Timothy Jaeryang Baek
6a17ba5b7a refac: metadata handling in vectordb 2025-07-31 17:45:06 +04:00
Tim Jaeryang Baek
dcade8cdf8
Merge pull request #15785 from bekzod/patch-1
BREAKING CHANGE: Update docling endpoint
2025-07-24 21:09:13 +04:00
Tim Jaeryang Baek
bd18bf5c83
Merge pull request #15951 from 0xThresh/s3vector-support
feat: Add S3 Vector Buckets Support for Knowledge
2025-07-23 12:02:20 +04:00
0xThresh.eth
860f3b3cab chore: run formatting 2025-07-22 22:46:00 -06:00
0xThresh.eth
8dcf668448 chore: final cleanup 2025-07-22 22:37:57 -06:00
0xThresh.eth
d463a29ba1 feat: S3 vector support tested 2025-07-22 21:36:35 -06:00
Hisma
21337a2fd8 ci fix 2025-07-22 22:07:14 -04:00
Hisma
a99e20cc3d add format_lines 2025-07-22 21:06:29 -04:00
Hisma
f31cc07a9d feat: update marker api 2025-07-22 20:49:28 -04:00
Timothy Jaeryang Baek
8bc7d85eac refac 2025-07-22 17:17:26 +04:00
Timothy Jaeryang Baek
bf3c807047 refac 2025-07-22 11:38:47 +04:00
0xThresh.eth
f6ee1965cb merge main 2025-07-21 18:06:17 -06:00
0xThresh.eth
5c59c50e2d more prgoress on s3 vector 2025-07-20 16:48:23 -06:00
bekzod
4bc054a347
Update docling endpoint 2025-07-16 20:40:13 +05:00
0xThresh.eth
d9f2b6b14e feat: add starter config for s3 vector 2025-07-15 21:20:54 -06:00
Timothy Jaeryang Baek
500e6e64fe refac 2025-07-15 21:57:24 +04:00
Timothy Jaeryang Baek
92c9068369 refac 2025-07-14 17:50:03 +04:00
Timothy Jaeryang Baek
18bd83413b refac 2025-07-14 14:05:06 +04:00
Timothy Jaeryang Baek
0013f5c1fc refac/enh: forward user info header to reranker 2025-07-14 13:59:10 +04:00
Timothy Jaeryang Baek
b4f04ff3a7 enh/refac: pgvector pool support 2025-07-14 12:18:44 +04:00
Tim Jaeryang Baek
9b84a8e443
Merge pull request #15632 from athoik/quote
fix: don't over quote forwarded headers
2025-07-12 00:24:29 +04:00
Timothy Jaeryang Baek
77c1905609 refac 2025-07-11 12:35:42 +04:00
Timothy Jaeryang Baek
033d07ee23 refac: file handling 2025-07-11 12:29:17 +04:00
Timothy Jaeryang Baek
3b9d86de0b refac 2025-07-11 12:00:21 +04:00
Athanasios Oikonomou
96758176cc fix: don't over quote forwarded headers
Fix introduced on #15035 is over quoting headers.

Eg mails instead of user@example.com shown as user%40example.com
Eg names instead of First Last shown as First%20Last

Also we are spending some time quoting ids and roles without required.

Keep quote only on user name, initially had problem based on the discussion
https://github.com/open-webui/open-webui/discussions/14391

Also add space in safe characters, in order remove %20 from names.
2025-07-10 22:08:28 +03:00
Wonyong Lee
46e0992a83 json_serialize returing varchar2(2096) 2025-07-10 12:12:43 +00:00
Timothy Jaeryang Baek
8d84b4c2a4 enh/refac: temp chat file upload behaviour
client-side content extraction
2025-07-09 22:59:37 +04:00
Timothy Jaeryang Baek
b3c4bc6041 enh: allow full context mode for collections 2025-07-09 01:29:49 +04:00
Timothy Jaeryang Baek
d5f9bbc7a7 enh: reference note in chat 2025-07-09 01:17:25 +04:00
Tim Jaeryang Baek
a748f19ac2
Merge pull request #15548 from expruc/fix/docling_ignore_html
fix: text/html files being detected as text when loaded with docling/tika
2025-07-08 13:16:01 +04:00
Oracle Public Cloud User
e0afd7f496 fianl : vector-search-feature 2025-07-07 17:25:16 +00:00
Oracle Public Cloud User
12ebdbae81 refactor oracle23ai.py 2025-07-07 16:21:34 +00:00
Oracle Public Cloud User
25e241ae41 added new feature : oracle23ai vector search 2025-07-07 12:13:05 +00:00
Timothy Jaeryang Baek
3e15c8ab69 refac 2025-07-07 15:56:05 +04:00
Oracle Public Cloud User
b56dbb26be alpha2 2025-07-07 08:52:58 +00:00
Oracle Public Cloud User
3e2fd074bb oracle 23ai vector search 2025-07-07 05:58:02 +00:00
expruc
453a2bd9b5 fixed issue where text/html files being detected as text when loaded 2025-07-06 20:10:26 +03:00
Anush008
17debaa6de
chore: Raise if QDRANT_URI is not set
Signed-off-by: Anush008 <anushshetty90@gmail.com>
2025-07-04 13:17:46 +05:30
Anush008
c8a49d373a
refactor: Removed more swallows
Signed-off-by: Anush008 <anushshetty90@gmail.com>
2025-07-04 12:38:22 +05:30
Anush008
0ac57a088f
refactor: More implementation improvements
Signed-off-by: Anush008 <anushshetty90@gmail.com>
2025-07-04 12:33:54 +05:30
Anush008
7c734d3fea
Merge remote-tracking branch 'origin/dev' into Anush008/main
Signed-off-by: Anush008 <anushshetty90@gmail.com>
2025-07-04 12:22:08 +05:30
Tim Jaeryang Baek
600344f2e8
Merge pull request #15510 from kopero2000/bug/oauth_logout_fix
fix/oauth logout fix
2025-07-04 10:30:02 +04:00
Bela Vizi
9623ef4360 add trust env to clientsession 2025-07-02 17:59:56 +02:00
guenhter
5c2e0e4beb feat: add qdrant indices for metadata fields
All fieldnames which are part of a query should
have an index for performance reasons. This is
even enforced on some qdrant cluster like those
on qdrant.io, and queries using a unindexed column
fail with an error.
2025-06-29 15:30:55 +02:00
Timothy Jaeryang Baek
1b064a6c85 chore: format 2025-06-28 15:21:20 +04:00
guenhter
a66206f44f feat: support better qdrant collection isolation
The prefix string for qdrant collection is now
configurable,  which means the same qdrant cluster
can be used to host more open webui instances and
to be able to separate the collections between the
different owui instances.
2025-06-26 13:52:26 +02:00