Commit graph

140 commits

Author SHA1 Message Date
Timothy Jaeryang Baek
5e1f4fa0ff feat: async file upload 2025-08-20 00:36:13 +04:00
Timothy Jaeryang Baek
f97f21bf3a refac/fix: rename WEB_SEARCH_CONCURRENT_REQUESTS to WEB_LOADER_CONCURRENT_REQUESTS 2025-08-18 20:06:36 +04:00
Timothy Jaeryang Baek
e8cb57750b refac/fix: serply 2025-08-09 00:37:37 +04:00
Timothy Jaeryang Baek
736b29ddca refac 2025-08-09 00:33:41 +04:00
Tim Jaeryang Baek
5db60ca34f
Merge pull request #15903 from Hisma/marker-api-update
feat: Add configurable API URL (for self-hosting) and additional_config parameter for Datalab Marker API
2025-08-04 15:21:03 +04:00
Timothy Jaeryang Baek
6a17ba5b7a refac: metadata handling in vectordb 2025-07-31 17:45:06 +04:00
Hisma
a99e20cc3d add format_lines 2025-07-22 21:06:29 -04:00
Hisma
f31cc07a9d feat: update marker api 2025-07-22 20:49:28 -04:00
Azure Wang
9aff166f83 - fix: keep reranker_model config been removed by web search config 2025-07-16 23:51:23 +08:00
Timothy Jaeryang Baek
abe280f0a3 refac/fix: reranking function 2025-07-16 13:56:02 +04:00
Timothy Jaeryang Baek
18bd83413b refac 2025-07-14 14:05:06 +04:00
Timothy Jaeryang Baek
0013f5c1fc refac/enh: forward user info header to reranker 2025-07-14 13:59:10 +04:00
Timothy Jaeryang Baek
87847ab31a chore: format 2025-07-13 00:15:16 +04:00
Tim Jaeryang Baek
e3b8f700e4
Merge pull request #14264 from diwakar-s-maurya/patch-6
feat: add langchain markdown document splitter
2025-07-08 15:55:20 +04:00
Tim Jaeryang Baek
2bad7eaa07
Merge pull request #15277 from hankewyczz/bug/restore-exa-search
fix Restore exa
2025-06-25 11:04:48 +04:00
Zachar Hankewycz
45d7726ee0
Restore exa 2025-06-24 21:24:53 -04:00
zhangtyzzz
5f60b30320
add missed exa 2025-06-19 13:52:58 +08:00
Timothy Jaeryang Baek
6c54ca552a feat: global image compression 2025-06-16 16:52:57 +04:00
Timothy Jaeryang Baek
f3cae94028 fix: bypass webloader
Co-Authored-By: WilliamGates <3852641+williamgateszhao@users.noreply.github.com>
2025-06-16 16:17:52 +04:00
Timothy Jaeryang Baek
0cd400f5ee refac: docling picture describe params 2025-06-08 20:02:14 +04:00
Vaclav Cerny
99f05561f8 Add configuration options for picture description modes and update related components 2025-06-08 16:30:26 +02:00
Diwakar Singh Maurya
871efb4ad9 feat: add langchain markdown document splitter 2025-06-07 06:02:53 +00:00
Dave
96e9bfe0e5 feat: add Perplexity model and search context usage configuration options 2025-06-03 00:19:08 +02:00
Timothy Jaeryang Baek
e1e2c096e2 refac: PLEASE follow existing convention 2025-05-30 00:34:18 +04:00
Tim Jaeryang Baek
ff353578db
Merge pull request #14370 from daw/feat/add-azure-openai-embeddings-option
feat:Add Azure OpenAI embedding support
2025-05-30 00:18:55 +04:00
Tim Jaeryang Baek
042c37ea34
Merge pull request #14311 from Hisma/marker-api-content-extraction
feat: Marker api content extraction support
2025-05-29 02:21:13 +04:00
Timothy Jaeryang Baek
4461122a0e fix: /api/v1/retrieval/query/collection endpoint 2025-05-28 18:45:47 +04:00
Hisma
a9405cc101 feat: Marker api content extraction support 2025-05-27 00:44:07 -04:00
Tim Jaeryang Baek
e663b90a9f
Merge pull request #14069 from Ithanil/bm25_weight
feat: Configurable weight for BM25Retriever during hybrid search
2025-05-24 01:13:03 +04:00
Jan Kessler
e70dd33233
rename BM25_WEIGHT -> HYBRID_BM25_WEIGHT 2025-05-23 22:06:44 +02:00
Timothy Jaeryang Baek
2eca6f6414 feat: bypass web loader in web search
Co-Authored-By: Perry Li <peiyaoli@mail.nankai.edu.cn>
Co-Authored-By: WilliamGates <3852641+williamgateszhao@users.noreply.github.com>
2025-05-23 02:30:35 +04:00
Jan Kessler
308d8ac04a
make bm25_weight a regular parameter of query_doc.. / get_sources_from_files functions 2025-05-20 11:46:32 +02:00
Jan Kessler
b5ddaf6417
make weight for bm25 retriever in hybrid search ui-configurable 2025-05-20 10:39:31 +02:00
Derek Wischusen
42be1f956a Add Azure OpenAI embedding support 2025-05-19 22:58:04 -04:00
Timothy Jaeryang Baek
2bd7db12a2 enh: ALLOWED_FILE_EXTENSIONS ui 2025-05-16 21:05:52 +04:00
Timothy Jaeryang Baek
8732b64b6b feat: external document loader support 2025-05-14 22:28:40 +04:00
Timothy Jaeryang Baek
de70d0cb64 feat: docling do picture description support 2025-05-14 21:26:49 +04:00
hwzhuhao
6f869ded43 feat:Add vector type and vector factory class for vector database integration 2025-05-14 21:30:50 +08:00
Timothy Jaeryang Baek
6f635d8b7d refac 2025-05-10 19:16:09 +04:00
Timothy Jaeryang Baek
be912f1529 refac 2025-05-10 18:29:04 +04:00
Timothy Jaeryang Baek
d5fd3b3600 feat: external reranker
Co-Authored-By: Brendan Campbell <20541191+bcambs09@users.noreply.github.com>
2025-05-10 18:25:20 +04:00
Timothy Jaeryang Baek
34ec10a78c refac: web search performance
Co-Authored-By: Mabeck <64421281+mmabeck@users.noreply.github.com>
2025-05-10 17:54:41 +04:00
tth37
c95a65a4bd fix: Duplicate web search urls 2025-05-09 20:06:35 +08:00
Timothy Jaeryang Baek
b50dcb1862 refac: remove duplicate urls 2025-05-07 22:25:18 +04:00
Athanasios Oikonomou
657162e96d feat(ocr): add support for Docling OCR engine and language configuration
This commit adds support for configuring the OCR engine and language(s) for Docling.
Configuration can be set via the environment variables `DOCLING_OCR_ENGINE` and `DOCLING_OCR_LANG`, or through the UI.

Fixes #13133
2025-05-03 00:32:06 +03:00
Tim Jaeryang Baek
e87f2669fa
Merge pull request #13191 from tth37/feat_firecrawl_search_engine
feat: Add Firecrawl search engine
2025-04-29 08:38:28 -07:00
Tim Jaeryang Baek
7b863465a9
Merge pull request #13311 from stephen304/yacy-support
feat: Yacy search support
2025-04-29 08:35:10 -07:00
Stephen Smith
240d91d38d Add yacy config for user/pass, automatically add yacy json api path 2025-04-26 22:28:30 -04:00
Stephen Smith
0f73b96616 first pass at yacy support copied from searxng 2025-04-26 14:07:13 -04:00
tth37
92dbeb1939 feat: Add Firecrawl search engine 2025-04-24 14:57:28 +08:00