Commit graph

475 commits

Author SHA1 Message Date
lucy
b0965a8184
fixes #14752 and adds manual transcription option 2025-06-08 14:26:24 +02:00
Timothy Jaeryang Baek
5e35aab292 chore: format 2025-06-05 01:12:28 +04:00
Tim Jaeryang Baek
7c4f261aa2
Merge pull request #14616 from Davixk/feat/new-perplexity-options
feat: add Perplexity AI model and search context usage configuration options
2025-06-05 00:28:00 +04:00
Vaclav Cerny
9772c18b20 fix(loader): remove deprecated picture description configuration 2025-06-04 17:21:44 +02:00
Vaclav Cerny
c71236ba07 feat(loader): enhance picture description prompt for improved detail and clarity 2025-06-04 14:25:31 +02:00
Vaclav Cerny
c4278f4784 fix description vs classification mismatch 2025-06-04 14:13:00 +02:00
Vaclav Cerny
8644e81a1c feat(loader): add picture description configuration for DoclingLoader 2025-06-04 12:34:39 +02:00
Timothy Jaeryang Baek
4d364e2967 refac: remove msg from known type 2025-06-03 16:27:28 +04:00
Dave
77b357c73b fix: update label for search context usage to clarify its purpose 2025-06-03 00:27:07 +02:00
Dave
96e9bfe0e5 feat: add Perplexity model and search context usage configuration options 2025-06-03 00:19:08 +02:00
Tim Jaeryang Baek
3c32d2cada
Merge pull request #14539 from PVBLIC-F/refac/mistral
perf mistral.py Enhance for Overall Speed and Efficiency
2025-06-02 23:52:59 +04:00
PVBLIC Foundation
cf3635ba25
Update mistral.py
1. Intelligent Error Handling
Added _is_retryable_error() method to distinguish retryable vs non-retryable errors
Prevents unnecessary retries on client errors (4xx) that won't succeed
Caps retry delay at 30 seconds to prevent excessive waiting
2. Optimized Timeout Configuration
Upload: Capped at 2 minutes (was using full 5-minute timeout)
URL requests: 30 seconds (should be fast)
OCR processing: Full timeout (can take time)
Cleanup: 30 seconds (should be quick)
3. Enhanced Connection Pool
Increased connection limits: 20 total, 10 per host
Longer DNS cache TTL (10 minutes vs 5 minutes)
Increased keepalive timeout (60s vs 30s)
Added async DNS resolver for better performance
Granular timeout controls (connect, read, total)
4. Concurrency Control for Batch Processing
Added semaphore-based concurrency control (default: 5 concurrent)
Prevents API overwhelming while maintaining throughput
Configurable concurrency limit per workload
5. Memory Efficient Result Processing
Early exit for empty content validation
Better error metadata for debugging
Added content length tracking
Streamlined page processing logic
6. General Performance Improvements
Better error logging with truncated responses
Optimized metadata creation
Improved debug logging efficiency
2025-05-30 20:06:29 -07:00
PVBLIC Foundation
66bde32623
Update pinecone.py 2025-05-30 18:47:23 -07:00
PVBLIC Foundation
4ecf2a8685
Update pinecone.py
May 2025 Latest Pinecone Best Practices
2025-05-30 09:33:57 -07:00
Timothy Jaeryang Baek
9306ae5972 refac 2025-05-30 01:19:56 +04:00
Timothy Jaeryang Baek
e1e2c096e2 refac: PLEASE follow existing convention 2025-05-30 00:34:18 +04:00
Tim Jaeryang Baek
ff353578db
Merge pull request #14370 from daw/feat/add-azure-openai-embeddings-option
feat:Add Azure OpenAI embedding support
2025-05-30 00:18:55 +04:00
Timothy Jaeryang Baek
7dc7d5c028 refac: PLEASE FOLLOW EXISTING CONVENTION 2025-05-29 03:47:02 +04:00
Timothy Jaeryang Baek
551597b9cc chore: format 2025-05-29 02:36:33 +04:00
Hisma
e12a79c0e2 fix: handle json output format correctly 2025-05-27 01:12:03 -04:00
Hisma
a9405cc101 feat: Marker api content extraction support 2025-05-27 00:44:07 -04:00
Timothy Jaeryang Baek
da75d0ca1e chore: format 2025-05-24 02:13:54 +04:00
Tim Jaeryang Baek
e663b90a9f
Merge pull request #14069 from Ithanil/bm25_weight
feat: Configurable weight for BM25Retriever during hybrid search
2025-05-24 01:13:03 +04:00
Timothy Jaeryang Baek
8b5e89eada chore: format 2025-05-24 00:43:38 +04:00
Jan Kessler
e70dd33233
rename BM25_WEIGHT -> HYBRID_BM25_WEIGHT 2025-05-23 22:06:44 +02:00
Tim Jaeryang Baek
c8f1bdf928
Merge pull request #14245 from PVBLIC-F/dev
perf Update mistral.py
2025-05-23 21:57:16 +04:00
PVBLIC Foundation
bf193dfb5d
Update mistral.py 2025-05-23 10:00:19 -07:00
Timothy Jaeryang Baek
aac25eac9e refac: reranker
Co-Authored-By: Tornike Gurgenidze <togurg14@freeuni.edu.ge>
2025-05-23 01:29:48 +04:00
Tim Jaeryang Baek
da4aa5f08b
Merge pull request #14152 from U8F69/fix_user_auth
fix(auth): correctly use password hash when duplicate email records exist
2025-05-22 14:58:10 +04:00
U8F69
dd6124a84f
fix(auth): fix invalid password use in auth 2025-05-22 11:03:43 +08:00
PVBLIC Foundation
86e24bb4aa
Update pinecone.py
I've improved the pinecone.py file by:
Updated from the deprecated PineconeGRPC client to the newer Pinecone client
Modified the client initialization code to match the new API requirements
Added better response handling with getattr() to safely access attributes from response objects
Removed the streaming_upsert method which is not available in the newer client
Added safer attribute access with fallbacks throughout the code
Updated the close method to reflect that the newer client doesn't need explicit closing
These changes ensure the code is compatible with the latest Pinecone Python SDK and will be more robust against future changes. The key improvement is migrating away from the deprecated gRPC client which will eventually stop working.
2025-05-21 15:28:42 -07:00
Tim Jaeryang Baek
d3c7628092
Merge pull request #14059 from sreesdas/main
fix: resolve issue where external document loader was not invoked
2025-05-20 17:43:06 +04:00
Tim Jaeryang Baek
fac5884d8c
Merge pull request #14073 from tth37/fix_default_web_loader_verify_ssl
fix: Default web loader fail silently when `verify_ssl=False`
2025-05-20 17:24:22 +04:00
tth37
78befd5a2f fix: Default web loader fail when verify_ssl=False 2025-05-20 19:44:18 +08:00
Jan Kessler
308d8ac04a
make bm25_weight a regular parameter of query_doc.. / get_sources_from_files functions 2025-05-20 11:46:32 +02:00
Jan Kessler
b5ddaf6417
make weight for bm25 retriever in hybrid search ui-configurable 2025-05-20 10:39:31 +02:00
sree
f408b08965 minor bug fix for external document loader not working 2025-05-20 11:10:23 +05:30
Derek Wischusen
42be1f956a Add Azure OpenAI embedding support 2025-05-19 22:58:04 -04:00
Marcelo Mendoza
d6ad96affb fix: use get method for title and snippet in search results 2025-05-19 17:24:47 +02:00
Timothy Jaeryang Baek
6692fb2181 chore: format 2025-05-17 01:00:37 +04:00
Kiet Trinh
418ac1a8da refac: Rename Qdrant multi-tenancy variable for improved clarity and consistency 2025-05-15 09:09:24 +00:00
Kiet Trinh
485bd7666c fix: Update Qdrant multi-tenancy variable name for consistency in configuration 2025-05-15 08:02:58 +00:00
LoiTra
184d8dfd7e
feat: Implement Qdrant multi-tenancy support with collection management and tenant isolation 2025-05-15 11:28:06 +07:00
Timothy Jaeryang Baek
b143c71da2 refac: AIOHTTP_CLIENT_SESSION_SSL 2025-05-14 23:33:52 +04:00
Timothy Jaeryang Baek
42382b5167 fix 2025-05-14 22:46:01 +04:00
Timothy Jaeryang Baek
8732b64b6b feat: external document loader support 2025-05-14 22:28:40 +04:00
Timothy Jaeryang Baek
de70d0cb64 feat: docling do picture description support 2025-05-14 21:26:49 +04:00
hwzhuhao
6f869ded43 feat:Add vector type and vector factory class for vector database integration 2025-05-14 21:30:50 +08:00
Timothy Jaeryang Baek
6b5f99bf66 fix: external reranker 2025-05-10 19:33:34 +04:00
Timothy Jaeryang Baek
c61790b355 chore: format 2025-05-10 19:00:01 +04:00
Timothy Jaeryang Baek
d5fd3b3600 feat: external reranker
Co-Authored-By: Brendan Campbell <20541191+bcambs09@users.noreply.github.com>
2025-05-10 18:25:20 +04:00
PVBLIC Foundation
3f58a17e47
Update pinecone.py
•	Removed the unused Pinecone REST‐client import; we now only import ServerlessSpec and the gRPC client.
	•	Enhanced close()
	•	Call self.client.close() to explicitly shut down the underlying gRPC channel.
	•	Log success or a warning on failure.
	•	Still tear down the thread‐pool executor afterward.
	•	Context‐manager support
	•	Added __enter__()/__exit__() so you can do:

with PineconeClient() as client:
    client.insert(...)
# automatically calls client.close()
2025-05-10 06:07:27 -07:00
PVBLIC Foundation
12c2138982
Update pinecone.py
Refactor and added debug
2025-05-09 18:15:22 -07:00
PVBLIC Foundation
b38711a581
Update pinecone.py 2025-05-08 16:02:47 -07:00
PVBLIC Foundation
04b9065f08
Update pinecone.py
Now supports batched insert, upsert, and delete operations using a default batch size of 100, reducing API strain and improving throughput. All blocking calls to the Pinecone API are wrapped in asyncio.to_thread(...), ensuring async safety and preventing event loop blocking. The implementation includes zero-vector handling for efficient metadata-only queries, normalized cosine distance scores for accurate ranking, and protections against empty input operations. Logs for batch durations have been streamlined to minimize noise, while preserving key info-level success logs.
2025-05-08 15:53:30 -07:00
Matt Harrison
2df9f7fb4d fix: remove import for os module in milvus.py 2025-05-08 00:28:24 -04:00
Matt Harrison
731251d11a refac: streamline Milvus index type handling using configuration options 2025-05-07 23:39:56 -04:00
Matt Harrison
5e46c27806 refac: enhance MilvusClient with dynamic index type and improved logging 2025-05-07 21:51:28 -04:00
Timothy Jaeryang Baek
6359cb55fe chore: format 2025-05-07 02:01:03 +04:00
Tim Jaeryang Baek
ea07e242f5
Merge pull request #13528 from Classic298/dev
feat: Enhance YouTube Transcription Loader for multi-language support
2025-05-07 00:44:45 +04:00
Classic298
1dcbec71ec
Update youtube.py 2025-05-06 17:14:00 +02:00
Classic298
87dcbd198c
Update youtube.py 2025-05-06 17:11:03 +02:00
Classic298
d7927506f1
Update youtube.py 2025-05-06 17:06:21 +02:00
Classic298
f65dc715f9
Update youtube.py 2025-05-06 16:30:18 +02:00
Classic298
c69278c13c
Update youtube.py 2025-05-06 16:24:27 +02:00
Classic298
a129e0954e
Update youtube.py 2025-05-06 16:22:40 +02:00
Classic298
5e1cb76b93
Update youtube.py 2025-05-06 16:16:58 +02:00
Timothy Jaeryang Baek
e63b8b3879 refac 2025-05-06 00:46:32 +04:00
Timothy Jaeryang Baek
27da31dc83 fix: tikaloader extract images 2025-05-05 23:40:34 +04:00
Classic298
67a612fe24
Update youtube.py 2025-05-05 20:40:48 +02:00
Classic298
791dd24ace
Update youtube.py 2025-05-05 20:08:25 +02:00
Classic298
9cf3381381
Update youtube.py 2025-05-05 20:07:52 +02:00
Classic298
b0d74a59f1
Update youtube.py 2025-05-05 20:07:37 +02:00
Classic298
1a30b3746e
Update youtube.py 2025-05-05 20:03:00 +02:00
Classic298
0a3817ed86
Update youtube.py 2025-05-05 20:00:10 +02:00
Classic298
0a845db8ec
Update youtube.py 2025-05-05 19:57:21 +02:00
Classic298
7680ac2517
Update youtube.py 2025-05-05 19:57:06 +02:00
Timothy Jaeryang Baek
4cfb99248d chore: format 2025-05-03 23:48:24 +04:00
Athanasios Oikonomou
657162e96d feat(ocr): add support for Docling OCR engine and language configuration
This commit adds support for configuring the OCR engine and language(s) for Docling.
Configuration can be set via the environment variables `DOCLING_OCR_ENGINE` and `DOCLING_OCR_LANG`, or through the UI.

Fixes #13133
2025-05-03 00:32:06 +03:00
Tim Jaeryang Baek
7d184c3a14
Merge pull request #13085 from ayan4m1/fix/tika-image-ocr
fix: pass extractInlineImages header to Tika if PDF_EXTRACT_IMAGES is true
2025-05-02 03:47:51 -07:00
Tim Jaeryang Baek
61580e9490
Merge pull request #13404 from NoMoreFood/dev
fix: Use SHA256 For Query Result Computation
2025-05-01 04:55:16 -07:00
Bryan Berns
32257089f9 Use SHA256 For Query Result Computation 2025-05-01 03:56:20 -04:00
Alexander Grimm
da9966aca1 ~ truncate vectors for pgvector if too big 2025-04-30 05:35:17 +00:00
Tim Jaeryang Baek
4ee5dd58b7
Merge pull request #13177 from tth37/fix_firecrawl_loader_default_mode
fix: FireCrawlLoader default mode to scrape
2025-04-29 08:39:06 -07:00
Tim Jaeryang Baek
e87f2669fa
Merge pull request #13191 from tth37/feat_firecrawl_search_engine
feat: Add Firecrawl search engine
2025-04-29 08:38:28 -07:00
Tim Jaeryang Baek
7b863465a9
Merge pull request #13311 from stephen304/yacy-support
feat: Yacy search support
2025-04-29 08:35:10 -07:00
Stephen Smith
ea16426a8d Remove unused kwargs in yacy, update comments. 2025-04-27 00:41:46 -04:00
Stephen Smith
f9b9217e98 Set Yacy search to text 2025-04-26 23:13:31 -04:00
Stephen Smith
e6d43d70f3 Don't request nav and pass count to Yacy 2025-04-26 23:08:16 -04:00
Stephen Smith
240d91d38d Add yacy config for user/pass, automatically add yacy json api path 2025-04-26 22:28:30 -04:00
Stephen Smith
0f73b96616 first pass at yacy support copied from searxng 2025-04-26 14:07:13 -04:00
tth37
92dbeb1939 feat: Add Firecrawl search engine 2025-04-24 14:57:28 +08:00
tth37
8f7195ceda fix: FireCrawlLoader default mode to scrape 2025-04-24 01:17:35 +08:00
Tim Jaeryang Baek
91e758f3ec
Merge pull request #13165 from feddersen-group/perf/parallel_knowledge_search
perf: all knowledge searches in parallel in non-hybrid mode
2025-04-23 10:01:06 -07:00
Timothy Jaeryang Baek
09874ab83d fix: FireCrawlLoader 2025-04-24 01:40:34 +09:00
Alexander Grimm
d182155fac ~ call knowledge searches in parallel in non-hybrid mode 2025-04-23 09:20:51 +00:00
Tim Jaeryang Baek
faa3cac0e4
Merge pull request #13107 from tth37/fix_tavily_max_results
fix: `max_results` in Tavily search handler
2025-04-22 23:47:36 -07:00
tth37
bc315bd530 fix: max_results in Tavily search api 2025-04-21 20:59:47 +08:00
Athanasios Oikonomou
1e291aff25 feat: Add abstract base class for vector database integration
- Created `VectorDBBase` as an abstract base class to standardize vector database operations.
- Added required methods for common vector database operations: `has_collection`, `delete_collection`, `insert`, `upsert`, `search`, `query`, `get`, `delete`, `reset`.
- The base class can now be extended by any vector database implementation (e.g., Qdrant, Pinecone) to ensure a consistent API across different database systems.
2025-04-21 08:27:27 +03:00
ayan4m1
039dec6820 fix: pass header to Tika if PDF_EXTRACT_IMAGES is true 2025-04-20 17:36:40 +02:00
Athanasios Oikonomou
e000c56ef7 feat(vector-db): add support for Pinecone client
Adds Pinecone as a supported vector database option.

- Implements `PineconeClient` with support for common operations: `add`, `query`, `delete`, `reset`.
- Emulates namespace support using metadata filtering (`collection_name` prefix).
- Dynamically configures Pinecone via the following env vars:
  - `PINECONE_API_KEY`
  - `PINECONE_ENVIRONMENT`
  - `PINECONE_INDEX_NAME`
  - `PINECONE_DIMENSION`
  - `PINECONE_METRIC`
  - `PINECONE_CLOUD`
- Integrates cleanly with the vector DB abstraction layer.
- Includes markdown documentation under `docs/getting-started/env-configuration.md`.

BREAKING CHANGE: None
2025-04-20 11:08:51 +03:00
Tim Jaeryang Baek
87844a8042
Merge pull request #12822 from tth37/feat_external_search_loader
feat: Support for Self-Hosted/External Web Search/Loader Engines
2025-04-18 23:51:27 -07:00
Juan Calderon-Perez
6188c0c5b7 Add suport for Qdrant GRPC 2025-04-17 01:13:49 -04:00
Juan Calderon-Perez
b4d0d840d1
Fix formatting of qdrant.py 2025-04-15 08:56:51 -04:00
Athanasios Oikonomou
575c12f80c feat: add QDRANT_ON_DISK configuration option for Qdrant integration
This commit will allow configuring the on_disk client parameter, to reduce the memory usage.
https://qdrant.tech/documentation/concepts/storage/?q=mmap#configuring-memmap-storage
Default is false, keeping vectors in memory.
2025-04-15 01:40:57 +03:00
tth37
008fec80c1 fix: Update external search/loader method to POST 2025-04-14 18:17:27 +08:00
tth37
22f0365cef format 2025-04-14 02:05:58 +08:00
tth37
839ba22c90 feat: Backend for Self-Hosted/External Web Search/Loader Engines 2025-04-14 01:49:05 +08:00
Timothy Jaeryang Baek
91a455a284 chore: format 2025-04-12 16:35:11 -07:00
Timothy Jaeryang Baek
48a23ce3fe refac: web/rag config 2025-04-12 16:33:36 -07:00
Tim Jaeryang Baek
62ef0bad6f
Merge pull request #12680 from lucyknada/patch-1
fix #12678
2025-04-10 08:46:41 -07:00
Timothy Jaeryang Baek
63e5200e2f refac 2025-04-10 08:46:12 -07:00
Youggls
3e2a6df1fb feat: Add sougou web search API for backend, add config panel in for frontend. 2025-04-10 14:51:44 +08:00
lucy
bc295546cd
fix #12678 2025-04-10 07:23:34 +02:00
Tim Jaeryang Baek
2575dac4ed
Merge pull request #12604 from maurerle/ddg_improve_stacktrace
**fix** improve stack trace of duckduckgo exception
2025-04-08 13:03:57 -07:00
Robert Norberg
2337b36609
add debug logging to RAG utils 2025-04-08 12:08:32 -04:00
Florian Maurer
760ea3f4af
duckduckgo: backend api has been deprecated since december
also increase duckduckgo-search version

see 3ee8e08b1c
2025-04-08 14:02:06 +02:00
Florian Maurer
337c7caafa
improve stack trace of duckduckgo exception
* fix search_results out of scope
* ddgs.text does already always return a list
2025-04-08 13:52:23 +02:00
Timothy Jaeryang Baek
65ed76abe1 refac: embedding prefix 2025-04-06 17:17:24 -07:00
Timothy Jaeryang Baek
ef787e4a79
Merge pull request #12486 from FabioPolito24/text-file-handling-docling
fix: text file handling with docling
2025-04-05 09:55:51 -07:00
Fabio Polito
cd0a1b4852 fix: fix for text file handling with docling 2025-04-05 16:44:08 +00:00
Juan Calderon-Perez
324550423c
Fix formatting issues 2025-04-05 10:03:24 -04:00
Phlogi
8cf8121812
Update utils.py
Avoid running any tasks for collections that failed to fetch data (have assigned None)
2025-04-05 10:41:21 +02:00
Patrick Wachter
0ac00b9256
refactor: update import path for MistralLoader 2025-04-02 13:56:10 +02:00
Patrick Wachter
c5a8d2f857
refactor: update MistralLoader documentation and adjust parameters for signed URL retrieval 2025-04-01 20:14:34 +02:00
Patrick Wachter
93d7702e8c
refactor: move MistralLoader to a separate module and just use the requests package instead of mistralai 2025-04-01 20:14:34 +02:00
Patrick Wachter
1ac6879268
Add Mistral OCR integration and configuration support 2025-04-01 14:24:33 +02:00
Timothy Jaeryang Baek
391dd33da3 chore: format 2025-03-31 17:59:21 -07:00
Timothy Jaeryang Baek
3ba12e7a43
Merge pull request #12239 from Phlogi/dev-threads-on-hybrid
perf: parallelize hybrid search
2025-03-31 17:06:32 -07:00
Timothy Jaeryang Baek
cafc5413f5 refac 2025-03-31 14:13:27 -07:00
Phlogi
9c64310db5
Run hybrid_search in parallel 2025-03-31 16:43:37 +02:00
Timothy Jaeryang Baek
4b75966401 refac: embedding prefix var naming 2025-03-30 21:55:15 -07:00
Timothy Jaeryang Baek
433b5bddc1
Merge pull request #8594 from jayteaftw/main
feat: Support for instruct/prefixing embeddings
2025-03-30 21:54:44 -07:00
Timothy Jaeryang Baek
50b8dec3ac fix/refac: hybrid search 2025-03-30 20:48:22 -07:00
Timothy Jaeryang Baek
ce0d82b55f
Merge pull request #12132 from Phlogi/dev-fetch-documents-once
Avoid multiple data fetching
2025-03-30 20:44:32 -07:00
Junaid Pinjari
e782e7d3a7 Fix: CSV loader encoding issue using autodetect_encoding=True 2025-03-29 13:14:53 +05:30
Phlogi
04bf9ddab2
Avoid multiple data fetching 2025-03-27 19:05:20 +01:00
Timothy Jaeryang Baek
4a79320253 chore: format 2025-03-27 01:40:28 -07:00
Timothy Jaeryang Baek
7490bc9100
Merge branch 'dev' into fix-db-order 2025-03-26 20:55:42 -07:00
Timothy Jaeryang Baek
9d834a8e90
Merge branch 'dev' into k_reranker 2025-03-26 20:50:31 -07:00
Marko Henning
7531b7dcaa Satisfy github format check 2025-03-25 19:09:17 +01:00
Iván Baldo
115e46a6a2 Fix: Tika 3.1.0.0 sends a lot of blank lines which degrades the RAG results, strip them. 2025-03-25 14:53:14 -03:00
Marko Henning
94d9d3d590 Fix: Normalze all database distances to score in [0, 1] 2025-03-25 16:46:14 +01:00
Timothy Jaeryang Baek
38d524f6a0 chore: format 2025-03-24 11:35:32 -07:00
Jonathan Flower
bdd236fa3a improved error handling for deleting collections that do not exist in chromadb 2025-03-22 09:59:06 -04:00
Timothy Jaeryang Baek
8aa6dade41
Merge pull request #11876 from mahenning/fix--rag-sorting
Fix: wrong citation order for chromadb, wrong order for hybrid search
2025-03-20 17:54:22 -07:00
Timothy Jaeryang Baek
9b20ef4922 refac 2025-03-20 14:01:47 -07:00
genjuro
07098c6352 perf: set shorter timeout for playwright and make it configurable 2025-03-20 15:28:09 +08:00
Marko Henning
5f48af5b91 Revert the ordering change with chromadb, not necessary with reranker results 2025-03-19 17:04:45 +01:00
Marko Henning
ec8fc727b8 Fix wrong order for chromadb 2025-03-19 16:06:10 +01:00
leilibj
3e8546135d
fix: correct incorrect usage of log.exception method 2025-03-19 13:04:34 +08:00
Marko Henning
5ab789e83e Add documentation on chroma special case 2025-03-18 16:44:58 +01:00
Marko Henning
ba676b7ed6 Use k_reranker also for result merge, and add special sorting use case for ChromaDB 2025-03-18 16:25:24 +01:00
Marko Henning
f13948d805 Fixed typo 2025-03-18 12:14:59 +01:00
Marko Henning
c877b59cbc Address edge case with k < k_reranker, sort results for cutting off 2025-03-18 11:31:17 +01:00
orenzhang
c761e4fd08
feat(trace): opentelemetry instrument 2025-03-10 22:27:31 +08:00
Fabio Polito
9d6743824e fix: fix params DoclingLoader 2025-03-09 16:12:14 +00:00
Fabio Polito
0aa42615f9 Merge remote-tracking branch 'upstream/dev' into docling_context_extraction_engine
merge upstream
2025-03-08 18:52:51 +00:00
Timothy Jaeryang Baek
22b88f9593
Merge pull request #11324 from kela4/main
fix: opensearch vector db query structures, result mapping, filters, bulk query actions, knn_vector usage
2025-03-08 12:19:38 -04:00
Luke
7917128ed3 enh: enable configuration for tavily extract depth 2025-03-08 00:43:02 -05:00
Fabio Polito
e3eef58310 feat: merge with dev 2025-03-07 00:22:47 +00:00
Luke
987954c817 feat: Add Tavily extract web loader integration 2025-03-06 18:15:18 -05:00
Katharina
6cb0c0339a fix: opensearch vector db query structures, result mapping, filters, bulk query actions, knn_vector usage 2025-03-06 23:49:54 +01:00
Fabio Polito
98857184ff Merge remote-tracking branch 'upstream/dev' into docling_context_extraction_engine
merge with dev branch
2025-03-06 12:12:50 +00:00
Marko Henning
41a4cf7106 Added new k_reranker parameter 2025-03-06 10:47:57 +01:00
Timothy Jaeryang Baek
d4fca9dabf chore: format 2025-03-05 19:17:41 -08:00
Fabio Polito
0716f96da8 style: change style in DoclingLoader 2025-03-05 23:15:55 +00:00
Fabio Polito
9aa407dbd2 feat: merge with main 2025-03-05 22:04:34 +00:00
ofek
a8f205213c fixed es bugs 2025-03-05 23:19:56 +02:00
Fabio Polito
a44b35e99e fix: fix DoclingLoader input params 2025-03-05 17:53:45 +00:00
Timothy Jaeryang Baek
7b442e4be0
Merge pull request #11141 from Youggls/dev
fix: correct parameter name for MilvusClient instantiation
2025-03-04 00:54:49 -08:00
Timothy Jaeryang Baek
39ea59edc8 chore: format 2025-03-04 00:32:27 -08:00
Perry Li
67ed61d022
fixbug: correct parameter name for MilvusClient instantiation
Replace incorrect parameter 'database=MILVUS_DB' with valid 'db_name=MILVUS_DB'
2025-03-04 16:02:19 +08:00
ofek
737dfd2763 added elasticsearch support 2025-03-03 23:39:42 +02:00
Timothy Jaeryang Baek
6471f12668
Merge pull request #11033 from dtaivpp/main
fix: Changed to use collection_name and fixed bulk indexing missing index.
2025-03-01 16:00:13 -08:00
David Tippett
f3c4c2b8e3
Changed to use colleciton name and fixed bulk indexing missing index. 2025-03-01 13:26:19 -05:00
Timothy Jaeryang Baek
d0ddb0637e enh: web embed bypass embedding and retrieval support 2025-02-27 16:34:05 -08:00
Timothy Jaeryang Baek
1b56a8f3cb
Merge pull request #10864 from kurtdami/perplexity_integration
feat: add perplexity integration to web search
2025-02-27 13:51:03 -08:00
kurtdami
b061775932 feat: add perplexity integration to web search 2025-02-27 00:30:48 -08:00
Timothy Jaeryang Baek
ce7cf62a55 refac: dedup 2025-02-26 23:51:39 -08:00
Timothy Jaeryang Baek
ddb30589e3 chore: format
HIDE MODELS
2025-02-26 22:18:18 -08:00
Timothy Jaeryang Baek
57010901e6 enh: bypass embedding and retrieval 2025-02-26 15:42:19 -08:00
Timothy Jaeryang Baek
34aeaaf020 refac 2025-02-26 13:54:26 -08:00
Timothy Jaeryang Baek
46ac6f2b29 fix 2025-02-26 12:53:07 -08:00
Timothy Jaeryang Baek
33d3558ca9
Merge pull request #10817 from NovoNordisk-OpenSource/ivaroli/adding-json-as-supported-file-type
fix: Using the TextLoader instead of Tika for JSON files
2025-02-26 12:49:29 -08:00
Ívar Óli Sigurðsson
c5a09cdd21 adding a comma 2025-02-26 15:27:03 +01:00
Ívar Óli Sigurðsson
661711164a Adding json as a known source for Tika 2025-02-26 15:11:21 +01:00
Timothy Jaeryang Baek
3be5e3129b
Merge pull request #10752 from NovoNordisk-OpenSource/yvedeng/standardize-logging
refactor: replace print statements with logging
2025-02-25 10:53:02 -08:00
Yifang Deng
0e5d5ecb81
refactor: replace print statements with logging for better error tracking 2025-02-25 15:53:55 +01:00
Timothy Jaeryang Baek
ab1b910d80
Merge pull request #10486 from Micca/feature/document_intelligence_support
Feat: Adding Support for Azure AI Document Intelligence for Content Extraction (Revised)
2025-02-21 10:56:18 -08:00
Timothy Jaeryang Baek
93d486d50e revert: faulty dedup code 2025-02-20 11:02:45 -08:00
Timothy Jaeryang Baek
eeb00a5ca2 chore: format 2025-02-20 01:01:29 -08:00
Youggls
0fb3c08181 feat: Add Firecrawl web loader integration 2025-02-19 16:54:44 +08:00
Timothy Jaeryang Baek
c073b8b4ee refac 2025-02-18 23:49:27 -08:00
Timothy Jaeryang Baek
5465cabd40 refac 2025-02-18 21:17:09 -08:00
Timothy Jaeryang Baek
81715f6553 enh: RAG full context mode 2025-02-18 21:14:58 -08:00
Timothy Jaeryang Baek
1bbecd46c8
Merge pull request #10052 from roryeckel/playwright
Support Playwright RAG Web Loader: Revised
2025-02-18 19:57:48 -08:00
Timothy Jaeryang Baek
4ef7aff663 refac 2025-02-18 19:35:22 -08:00
mikhail-khludnev
925bfe840b dedupe results from multiple queries 2025-02-18 20:10:57 +03:00
Rory
10e0c81de9 Merge remote-tracking branch 'upstream/dev' into playwright
# Conflicts:
#	backend/open_webui/retrieval/web/utils.py
#	backend/open_webui/routers/retrieval.py
2025-02-17 21:53:39 -06:00
Rory
bc82f48ebf refac: RAG_WEB_LOADER -> RAG_WEB_LOADER_ENGINE 2025-02-17 21:43:32 -06:00
Timothy Jaeryang Baek
ba6cde8a87 fix: include_domain does NOT exist 2025-02-17 19:20:49 -08:00
Timothy Jaeryang Baek
dbe5d1ca08 refac 2025-02-17 18:16:23 -08:00
Timothy Jaeryang Baek
ca0b7217d2 enh: full context web search 2025-02-17 18:14:26 -08:00
Rory
66c2acc08d Merge branch 'dev' into playwright 2025-02-15 22:14:16 -06:00
Timothy Jaeryang Baek
b0ad5cd863
Merge pull request #10076 from crizCraig/local_date
fix: return local date from `getFormattedDate`
2025-02-15 20:10:56 -08:00
Timothy Jaeryang Baek
3d0c06ccee refac: duckduckgo 2025-02-15 16:45:56 -08:00
Craig Quiter
e67eb89e05 style: black format 2025-02-15 10:53:16 -08:00
Rory
8e9b00a017 Fix docstring 2025-02-14 22:48:15 -06:00
Rory
aa2b764d74 Finalize incomplete merge to update playwright branch
Introduced feature parity for trust_env
2025-02-14 22:32:45 -06:00
Rory
4da220c513 Merge remote-tracking branch 'upstream/dev' into playwright
# Conflicts:
#	backend/open_webui/config.py
#	backend/open_webui/main.py
#	backend/open_webui/retrieval/web/utils.py
#	backend/open_webui/routers/retrieval.py
#	backend/open_webui/utils/middleware.py
#	pyproject.toml
2025-02-14 20:48:22 -06:00
Guofeng Yi
b38acc8559
Merge branch 'dev' into feate-webloader-support-proxy 2025-02-15 09:50:02 +08:00
Timothy Jaeryang Baek
3e543691a4
Merge pull request #9988 from Yimi81/feat-support-async-load
feat: websearch support async docs load
2025-02-14 14:10:46 -08:00
LiuC0j
5ca39eb9fd
Update tavily.py 2025-02-14 14:56:01 +01:00
Fabio Polito
2419ef06a0 feat: docling support for document preprocessing 2025-02-14 12:08:03 +00:00
Yimi81
d3f71930f0 web loader support proxy 2025-02-14 07:15:09 +00:00
Yimi81
ceef600223 support async load for websearch 2025-02-14 07:05:10 +00:00
xring
27d395ba06 feat: add web search via SerpApi 2025-02-14 12:24:58 +08:00
Timothy Jaeryang Baek
5626426c31 chore: format 2025-02-12 23:28:57 -08:00
Rory
40d4db97e6 Merge remote-tracking branch 'upstream/dev' into playwright 2025-02-12 22:32:44 -06:00
Timothy Jaeryang Baek
a5bba20915
Merge pull request #9837 from silverriver/patch-1
feat Make Google PSE search return more than 10 google search results
2025-02-11 21:36:53 -08:00
Silver
7e08373ae5
Update google_pse.py to return results more than 10 2025-02-12 13:01:09 +08:00
Timothy Jaeryang Baek
8906a2e260
Merge pull request #9803 from BochaAI/main
add Bocha
2025-02-11 21:01:04 -08:00
luckyman-yan
31360fe991 add Bocha 2025-02-10 16:44:47 +08:00
Timothy Jaeryang Baek
60095598ec chore: format 2025-02-09 22:20:47 -08:00
Rory
2c711d8365 Merge remote-tracking branch 'upstream/dev' into playwright
# Conflicts:
#	backend/requirements.txt
2025-02-09 23:52:21 -06:00
Timothy Jaeryang Baek
d5a815b19c
Merge pull request #9693 from vinsdragonis/main
fix: Fixed error occurring when using OpenSearch as a vector db
2025-02-09 13:06:19 -08:00
Mazurek Michal
35f3824932 feat: Implement Document Intelligence as Content Extraction Engine 2025-02-07 13:44:47 +01:00
binxn
88db4ca7ba
Update jina_search.py
Updated Jina's search function in order to use POST and make use of the result count passed by the user

Note: Jina supports a max of 10 result count
2025-02-06 14:30:27 +01:00
Vineeth B V
7c78facfd9
Update opensearch.py 2025-02-06 13:36:11 +05:30
Vineeth B V
fd6b039859
Added a query method for OpenSearch vector db.
- This PR aims to address the error 400: "**'OpenSearchClient' object has no attribute 'query'**".
- With the implemented query() method, this issue should be resolved and allow uploaded documents to be vectorized and retrieved based on the given query.
2025-02-06 12:04:14 +05:30
Rory
ec6fe9939b Merge remote-tracking branch 'upstream/dev' into playwright 2025-02-05 17:47:58 -06:00
JT
40dea3fbe1
Merge branch 'dev' into main 2025-02-05 15:15:24 -08:00
jayteaftw
157c781b0a Merge branch 'main' of https://github.com/jayteaftw/open-webui 2025-02-05 14:07:59 -08:00
jayteaftw
6d2f87e904 Added server side Prefixing 2025-02-05 14:03:16 -08:00
Timothy Jaeryang Baek
e41a2682f5 chore: format 2025-02-05 00:07:45 -08:00
Timothy Jaeryang Baek
f6f8c08cb0
Merge pull request #9068 from df-cgdm/main
**feat** Add user related headers when calling an external embedding api
2025-02-05 00:05:44 -08:00
Timothy Jaeryang Baek
5cda8a57e7
Merge pull request #9337 from abdalrohman/exa_integration
feat: implement Exa search engine integration
2025-02-04 14:00:06 -08:00
JT
81102f4be2
Merge branch 'open-webui:main' into main 2025-02-04 13:06:04 -08:00
jvinolus
7b8e5d4e7c Fixed errors and added more support 2025-02-04 13:04:36 -08:00
M.Abdulrahman Alnaseer
2bb6b49f11 feat: implement Exa search engine integration 2025-02-04 21:13:16 +03:00
Timothy Jaeryang Baek
3adfa29f7d chore: format 2025-02-03 21:56:35 -08:00
Rory
7bac1a170d Merge remote-tracking branch 'upstream/dev' into playwright
# Conflicts:
#	backend/open_webui/retrieval/web/utils.py
2025-02-03 22:32:46 -06:00
Rory
1b581b714f Moving code out of playwright branch 2025-02-03 18:47:26 -06:00
Rory
3db6b4352f fix: Filter out invalid RAG web URLs (continued) 2025-02-03 18:18:49 -06:00
Rory
121a13d4ed fix: Filter to valid RAG web search URLs 2025-02-03 17:37:20 -06:00
Rory
f837d2cdbb Merge branch 'dev' of https://github.com/open-webui/open-webui
# Conflicts:
#	src/lib/i18n/locales/sr-RS/translation.json
2025-02-02 20:31:27 -06:00
Rory
8da33721d5 Support PLAYWRIGHT_WS_URI 2025-02-02 17:58:09 -06:00
Rory
a84e488a4e Fix playwright in docker by updating unstructured 2025-02-01 22:58:28 -06:00
Sajid Ali
7b31c75271 Milvus: new optional config var, MILVUS_TOKEN
modified:   backend/open_webui/config.py
	modified:   backend/open_webui/retrieval/vector/dbs/milvus.py
2025-01-31 17:01:00 -05:00