Commit graph

171 commits

Author SHA1 Message Date
Timothy Jaeryang Baek
0cd400f5ee refac: docling picture describe params 2025-06-08 20:02:14 +04:00
Vaclav Cerny
99f05561f8 Add configuration options for picture description modes and update related components 2025-06-08 16:30:26 +02:00
Diwakar Singh Maurya
871efb4ad9 feat: add langchain markdown document splitter 2025-06-07 06:02:53 +00:00
Dave
96e9bfe0e5 feat: add Perplexity model and search context usage configuration options 2025-06-03 00:19:08 +02:00
Timothy Jaeryang Baek
e1e2c096e2 refac: PLEASE follow existing convention 2025-05-30 00:34:18 +04:00
Tim Jaeryang Baek
ff353578db
Merge pull request #14370 from daw/feat/add-azure-openai-embeddings-option
feat:Add Azure OpenAI embedding support
2025-05-30 00:18:55 +04:00
Tim Jaeryang Baek
042c37ea34
Merge pull request #14311 from Hisma/marker-api-content-extraction
feat: Marker api content extraction support
2025-05-29 02:21:13 +04:00
Timothy Jaeryang Baek
4461122a0e fix: /api/v1/retrieval/query/collection endpoint 2025-05-28 18:45:47 +04:00
Hisma
a9405cc101 feat: Marker api content extraction support 2025-05-27 00:44:07 -04:00
Tim Jaeryang Baek
e663b90a9f
Merge pull request #14069 from Ithanil/bm25_weight
feat: Configurable weight for BM25Retriever during hybrid search
2025-05-24 01:13:03 +04:00
Jan Kessler
e70dd33233
rename BM25_WEIGHT -> HYBRID_BM25_WEIGHT 2025-05-23 22:06:44 +02:00
Timothy Jaeryang Baek
2eca6f6414 feat: bypass web loader in web search
Co-Authored-By: Perry Li <peiyaoli@mail.nankai.edu.cn>
Co-Authored-By: WilliamGates <3852641+williamgateszhao@users.noreply.github.com>
2025-05-23 02:30:35 +04:00
Jan Kessler
308d8ac04a
make bm25_weight a regular parameter of query_doc.. / get_sources_from_files functions 2025-05-20 11:46:32 +02:00
Jan Kessler
b5ddaf6417
make weight for bm25 retriever in hybrid search ui-configurable 2025-05-20 10:39:31 +02:00
Derek Wischusen
42be1f956a Add Azure OpenAI embedding support 2025-05-19 22:58:04 -04:00
Timothy Jaeryang Baek
2bd7db12a2 enh: ALLOWED_FILE_EXTENSIONS ui 2025-05-16 21:05:52 +04:00
Timothy Jaeryang Baek
8732b64b6b feat: external document loader support 2025-05-14 22:28:40 +04:00
Timothy Jaeryang Baek
de70d0cb64 feat: docling do picture description support 2025-05-14 21:26:49 +04:00
hwzhuhao
6f869ded43 feat:Add vector type and vector factory class for vector database integration 2025-05-14 21:30:50 +08:00
Timothy Jaeryang Baek
6f635d8b7d refac 2025-05-10 19:16:09 +04:00
Timothy Jaeryang Baek
be912f1529 refac 2025-05-10 18:29:04 +04:00
Timothy Jaeryang Baek
d5fd3b3600 feat: external reranker
Co-Authored-By: Brendan Campbell <20541191+bcambs09@users.noreply.github.com>
2025-05-10 18:25:20 +04:00
Timothy Jaeryang Baek
34ec10a78c refac: web search performance
Co-Authored-By: Mabeck <64421281+mmabeck@users.noreply.github.com>
2025-05-10 17:54:41 +04:00
tth37
c95a65a4bd fix: Duplicate web search urls 2025-05-09 20:06:35 +08:00
Timothy Jaeryang Baek
b50dcb1862 refac: remove duplicate urls 2025-05-07 22:25:18 +04:00
Athanasios Oikonomou
657162e96d feat(ocr): add support for Docling OCR engine and language configuration
This commit adds support for configuring the OCR engine and language(s) for Docling.
Configuration can be set via the environment variables `DOCLING_OCR_ENGINE` and `DOCLING_OCR_LANG`, or through the UI.

Fixes #13133
2025-05-03 00:32:06 +03:00
Tim Jaeryang Baek
e87f2669fa
Merge pull request #13191 from tth37/feat_firecrawl_search_engine
feat: Add Firecrawl search engine
2025-04-29 08:38:28 -07:00
Tim Jaeryang Baek
7b863465a9
Merge pull request #13311 from stephen304/yacy-support
feat: Yacy search support
2025-04-29 08:35:10 -07:00
Stephen Smith
240d91d38d Add yacy config for user/pass, automatically add yacy json api path 2025-04-26 22:28:30 -04:00
Stephen Smith
0f73b96616 first pass at yacy support copied from searxng 2025-04-26 14:07:13 -04:00
tth37
92dbeb1939 feat: Add Firecrawl search engine 2025-04-24 14:57:28 +08:00
Timothy Jaeryang Baek
732d7aee70 enh: sentence transformers env vars
Co-Authored-By: DrZoidberg09 <96449693+drzoidberg09@users.noreply.github.com>
2025-04-24 01:55:18 +09:00
Timothy Jaeryang Baek
09874ab83d fix: FireCrawlLoader 2025-04-24 01:40:34 +09:00
Timothy Jaeryang Baek
43efff0fe6 refac 2025-04-22 23:22:50 +09:00
Tim Jaeryang Baek
87844a8042
Merge pull request #12822 from tth37/feat_external_search_loader
feat: Support for Self-Hosted/External Web Search/Loader Engines
2025-04-18 23:51:27 -07:00
Youggls
9669cd3454 fix: use run_in_threadpool for search_web to prevent blocking
Used fastapi's run_in_threadpool function to execute the search_web function,
preventing the synchronous function from blocking the entire web search process.
2025-04-17 17:23:20 +08:00
tth37
85f8e91288 feat: Allow admin editing external search/loader settings 2025-04-14 18:19:26 +08:00
Timothy Jaeryang Baek
70718dda90 refac 2025-04-13 22:31:43 -07:00
tth37
839ba22c90 feat: Backend for Self-Hosted/External Web Search/Loader Engines 2025-04-14 01:49:05 +08:00
Timothy Jaeryang Baek
888b468576 fix 2025-04-12 23:00:34 -07:00
Timothy Jaeryang Baek
4dafbbccfc fix: rag template display issue 2025-04-12 22:55:24 -07:00
tth37
8d53f1e770 fix: small bugs on updated web/rag settings 2025-04-13 12:55:50 +08:00
Timothy Jaeryang Baek
48a23ce3fe refac: web/rag config 2025-04-12 16:33:36 -07:00
tth37
5eac5960ef feat: Add frontend configuration for web loader 2025-04-12 17:13:30 +08:00
Youggls
3e2a6df1fb feat: Add sougou web search API for backend, add config panel in for frontend. 2025-04-10 14:51:44 +08:00
Timothy Jaeryang Baek
914eb49767 chore: include accelerate dependency 2025-04-06 17:44:05 -07:00
Timothy Jaeryang Baek
cbe2056587 fix: audio file upload response issue 2025-04-06 17:31:50 -07:00
Timothy Jaeryang Baek
f243e523a6 refac 2025-04-06 15:52:38 -07:00
Timothy Jaeryang Baek
155dbd5a66 refac 2025-04-06 15:45:48 -07:00
Timothy Jaeryang Baek
9825d03602
Merge pull request #12507 from Ithanil/fix_web_result_collection_source_ids
fix: fix web results all getting the same source id when using embedding and retrieval
2025-04-06 15:43:21 -07:00
Jan Kessler
a506a1a61e
only keep URLs as sources for which the content could actually be retrieved 2025-04-06 20:31:12 +02:00
Jan Kessler
4476060044
fix web results all getting the same source id when using embedding and retrieval 2025-04-06 15:51:05 +02:00
Marko Henning
3b2b6e183d Added missing parameter for query_doc_with_hybrid_search. 2025-04-04 15:30:57 +02:00
Timothy Jaeryang Baek
94bf49440d enh: unload hybrid model if set to False 2025-04-02 18:15:14 -07:00
Patrick Wachter
1ac6879268
Add Mistral OCR integration and configuration support 2025-04-01 14:24:33 +02:00
Timothy Jaeryang Baek
cafc5413f5 refac 2025-03-31 14:13:27 -07:00
Timothy Jaeryang Baek
d542881ee4 refac 2025-03-30 21:55:20 -07:00
Timothy Jaeryang Baek
433b5bddc1
Merge pull request #8594 from jayteaftw/main
feat: Support for instruct/prefixing embeddings
2025-03-30 21:54:44 -07:00
Timothy Jaeryang Baek
4a79320253 chore: format 2025-03-27 01:40:28 -07:00
Timothy Jaeryang Baek
9d834a8e90
Merge branch 'dev' into k_reranker 2025-03-26 20:50:31 -07:00
Marko Henning
41a4cf7106 Added new k_reranker parameter 2025-03-06 10:47:57 +01:00
Fabio Polito
9aa407dbd2 feat: merge with main 2025-03-05 22:04:34 +00:00
Timothy Jaeryang Baek
efe8c4ca69 chore: format 2025-03-01 07:28:00 -08:00
Timothy Jaeryang Baek
d0ddb0637e enh: web embed bypass embedding and retrieval support 2025-02-27 16:34:05 -08:00
Timothy Jaeryang Baek
1b56a8f3cb
Merge pull request #10864 from kurtdami/perplexity_integration
feat: add perplexity integration to web search
2025-02-27 13:51:03 -08:00
kurtdami
b061775932 feat: add perplexity integration to web search 2025-02-27 00:30:48 -08:00
Timothy Jaeryang Baek
57010901e6 enh: bypass embedding and retrieval 2025-02-26 15:42:19 -08:00
Timothy Jaeryang Baek
78a8ef8e66 refac: audio file handling 2025-02-26 13:09:52 -08:00
Timothy Jaeryang Baek
3be5e3129b
Merge pull request #10752 from NovoNordisk-OpenSource/yvedeng/standardize-logging
refactor: replace print statements with logging
2025-02-25 10:53:02 -08:00
Yifang Deng
0e5d5ecb81
refactor: replace print statements with logging for better error tracking 2025-02-25 15:53:55 +01:00
hurxxxx
4cc3102758 feat: onedrive file picker integration 2025-02-25 01:47:07 +09:00
Timothy Jaeryang Baek
b14e75dd6c feat: added Trust Proxy Environment switch in Web Search admin settings tab.
Co-Authored-By: harry zhou <67385896+harryzhou2000@users.noreply.github.com>
2025-02-21 13:40:11 -08:00
Timothy Jaeryang Baek
ab1b910d80
Merge pull request #10486 from Micca/feature/document_intelligence_support
Feat: Adding Support for Azure AI Document Intelligence for Content Extraction (Revised)
2025-02-21 10:56:18 -08:00
Timothy Jaeryang Baek
81715f6553 enh: RAG full context mode 2025-02-18 21:14:58 -08:00
Rory
10e0c81de9 Merge remote-tracking branch 'upstream/dev' into playwright
# Conflicts:
#	backend/open_webui/retrieval/web/utils.py
#	backend/open_webui/routers/retrieval.py
2025-02-17 21:53:39 -06:00
Timothy Jaeryang Baek
ba6cde8a87 fix: include_domain does NOT exist 2025-02-17 19:20:49 -08:00
Timothy Jaeryang Baek
ca0b7217d2 enh: full context web search 2025-02-17 18:14:26 -08:00
Rory
b1bab2ece8 Remove duplicate loader.alazy_load line from merge 2025-02-14 22:43:46 -06:00
Rory
4da220c513 Merge remote-tracking branch 'upstream/dev' into playwright
# Conflicts:
#	backend/open_webui/config.py
#	backend/open_webui/main.py
#	backend/open_webui/retrieval/web/utils.py
#	backend/open_webui/routers/retrieval.py
#	backend/open_webui/utils/middleware.py
#	pyproject.toml
2025-02-14 20:48:22 -06:00
Guofeng Yi
b38acc8559
Merge branch 'dev' into feate-webloader-support-proxy 2025-02-15 09:50:02 +08:00
Fabio Polito
2419ef06a0 feat: docling support for document preprocessing 2025-02-14 12:08:03 +00:00
Yimi81
d3f71930f0 web loader support proxy 2025-02-14 07:15:09 +00:00
Yimi81
ceef600223 support async load for websearch 2025-02-14 07:05:10 +00:00
Timothy Jaeryang Baek
304aed0f13 chore: format 2025-02-13 22:54:45 -08:00
Timothy Jaeryang Baek
7b37cdcebb
Merge pull request #9980 from xring/web_search_serpapi
feat: add web search via SerpApi
2025-02-13 22:51:14 -08:00
Timothy Jaeryang Baek
c9a8808b0d refac 2025-02-13 21:45:29 -08:00
xring
27d395ba06 feat: add web search via SerpApi 2025-02-14 12:24:58 +08:00
Rory
40d4db97e6 Merge remote-tracking branch 'upstream/dev' into playwright 2025-02-12 22:32:44 -06:00
Timothy Jaeryang Baek
8906a2e260
Merge pull request #9803 from BochaAI/main
add Bocha
2025-02-11 21:01:04 -08:00
luckyman-yan
31360fe991 add Bocha 2025-02-10 16:44:47 +08:00
Rory
2c711d8365 Merge remote-tracking branch 'upstream/dev' into playwright
# Conflicts:
#	backend/requirements.txt
2025-02-09 23:52:21 -06:00
Timothy Jaeryang Baek
3c5ac4ace5
Merge pull request #9416 from abdalrohman/fix_filter_domains
feat(ui): implement domain filter list for web search settings
2025-02-07 11:23:15 -08:00
Mazurek Michal
35f3824932 feat: Implement Document Intelligence as Content Extraction Engine 2025-02-07 13:44:47 +01:00
Rory
ec6fe9939b Merge remote-tracking branch 'upstream/dev' into playwright 2025-02-05 17:47:58 -06:00
JT
40dea3fbe1
Merge branch 'dev' into main 2025-02-05 15:15:24 -08:00
M.Abdulrahman Alnaseer
68703951e8 feat(ui): implement domain filter list for web search settings 2025-02-05 19:14:40 +03:00
Timothy Jaeryang Baek
e41a2682f5 chore: format 2025-02-05 00:07:45 -08:00
Timothy Jaeryang Baek
f6f8c08cb0
Merge pull request #9068 from df-cgdm/main
**feat** Add user related headers when calling an external embedding api
2025-02-05 00:05:44 -08:00
JT
81102f4be2
Merge branch 'open-webui:main' into main 2025-02-04 13:06:04 -08:00
jvinolus
7b8e5d4e7c Fixed errors and added more support 2025-02-04 13:04:36 -08:00