Hisma
|
a99e20cc3d
|
add format_lines
|
2025-07-22 21:06:29 -04:00 |
|
Hisma
|
f31cc07a9d
|
feat: update marker api
|
2025-07-22 20:49:28 -04:00 |
|
expruc
|
453a2bd9b5
|
fixed issue where text/html files being detected as text when loaded
|
2025-07-06 20:10:26 +03:00 |
|
Tim Jaeryang Baek
|
600344f2e8
|
Merge pull request #15510 from kopero2000/bug/oauth_logout_fix
fix/oauth logout fix
|
2025-07-04 10:30:02 +04:00 |
|
Bela Vizi
|
9623ef4360
|
add trust env to clientsession
|
2025-07-02 17:59:56 +02:00 |
|
Timothy Jaeryang Baek
|
81b8267e85
|
feat: odt file parse support
|
2025-06-19 18:39:00 +04:00 |
|
Timothy Jaeryang Baek
|
7753f57d42
|
chore: format
|
2025-06-16 13:48:50 +04:00 |
|
Tim Jaeryang Baek
|
c5b48ec551
|
Merge pull request #14992 from sreesdas/dev
Fix: Added support for multiple pages in external document loader
|
2025-06-16 11:01:33 +04:00 |
|
sree
|
62bfe73964
|
Fix: Added support for multiple pages in external document loader, added filename in api request header
|
2025-06-15 19:59:05 +05:30 |
|
Vaclav Cerny
|
4bbc32efa6
|
fix: serialize picture description parameters to JSON in DoclingLoader
|
2025-06-11 20:00:25 +02:00 |
|
Timothy Jaeryang Baek
|
7f75acff96
|
chore: format
|
2025-06-08 22:08:25 +04:00 |
|
Timothy Jaeryang Baek
|
0cd400f5ee
|
refac: docling picture describe params
|
2025-06-08 20:02:14 +04:00 |
|
Tim Jaeryang Baek
|
6bf393a480
|
Merge pull request #14787 from vaclcer/vaclavs-custom-docling
feat: Customize Docling's "Describe Pictures" feature
|
2025-06-08 19:02:36 +04:00 |
|
Tim Jaeryang Baek
|
50d9a2ac58
|
Merge pull request #14781 from lucyknada/patch-2
fix: fix #14752 and add manual transcription retrieval
|
2025-06-08 18:40:28 +04:00 |
|
Vaclav Cerny
|
99f05561f8
|
Add configuration options for picture description modes and update related components
|
2025-06-08 16:30:26 +02:00 |
|
lucy
|
b0965a8184
|
fixes #14752 and adds manual transcription option
|
2025-06-08 14:26:24 +02:00 |
|
Timothy Jaeryang Baek
|
5e35aab292
|
chore: format
|
2025-06-05 01:12:28 +04:00 |
|
Vaclav Cerny
|
9772c18b20
|
fix(loader): remove deprecated picture description configuration
|
2025-06-04 17:21:44 +02:00 |
|
Vaclav Cerny
|
c71236ba07
|
feat(loader): enhance picture description prompt for improved detail and clarity
|
2025-06-04 14:25:31 +02:00 |
|
Vaclav Cerny
|
c4278f4784
|
fix description vs classification mismatch
|
2025-06-04 14:13:00 +02:00 |
|
Vaclav Cerny
|
8644e81a1c
|
feat(loader): add picture description configuration for DoclingLoader
|
2025-06-04 12:34:39 +02:00 |
|
Timothy Jaeryang Baek
|
4d364e2967
|
refac: remove msg from known type
|
2025-06-03 16:27:28 +04:00 |
|
PVBLIC Foundation
|
cf3635ba25
|
Update mistral.py
1. Intelligent Error Handling
Added _is_retryable_error() method to distinguish retryable vs non-retryable errors
Prevents unnecessary retries on client errors (4xx) that won't succeed
Caps retry delay at 30 seconds to prevent excessive waiting
2. Optimized Timeout Configuration
Upload: Capped at 2 minutes (was using full 5-minute timeout)
URL requests: 30 seconds (should be fast)
OCR processing: Full timeout (can take time)
Cleanup: 30 seconds (should be quick)
3. Enhanced Connection Pool
Increased connection limits: 20 total, 10 per host
Longer DNS cache TTL (10 minutes vs 5 minutes)
Increased keepalive timeout (60s vs 30s)
Added async DNS resolver for better performance
Granular timeout controls (connect, read, total)
4. Concurrency Control for Batch Processing
Added semaphore-based concurrency control (default: 5 concurrent)
Prevents API overwhelming while maintaining throughput
Configurable concurrency limit per workload
5. Memory Efficient Result Processing
Early exit for empty content validation
Better error metadata for debugging
Added content length tracking
Streamlined page processing logic
6. General Performance Improvements
Better error logging with truncated responses
Optimized metadata creation
Improved debug logging efficiency
|
2025-05-30 20:06:29 -07:00 |
|
Timothy Jaeryang Baek
|
7dc7d5c028
|
refac: PLEASE FOLLOW EXISTING CONVENTION
|
2025-05-29 03:47:02 +04:00 |
|
Timothy Jaeryang Baek
|
551597b9cc
|
chore: format
|
2025-05-29 02:36:33 +04:00 |
|
Hisma
|
e12a79c0e2
|
fix: handle json output format correctly
|
2025-05-27 01:12:03 -04:00 |
|
Hisma
|
a9405cc101
|
feat: Marker api content extraction support
|
2025-05-27 00:44:07 -04:00 |
|
Timothy Jaeryang Baek
|
8b5e89eada
|
chore: format
|
2025-05-24 00:43:38 +04:00 |
|
PVBLIC Foundation
|
bf193dfb5d
|
Update mistral.py
|
2025-05-23 10:00:19 -07:00 |
|
sree
|
f408b08965
|
minor bug fix for external document loader not working
|
2025-05-20 11:10:23 +05:30 |
|
Timothy Jaeryang Baek
|
8732b64b6b
|
feat: external document loader support
|
2025-05-14 22:28:40 +04:00 |
|
Timothy Jaeryang Baek
|
de70d0cb64
|
feat: docling do picture description support
|
2025-05-14 21:26:49 +04:00 |
|
Timothy Jaeryang Baek
|
6359cb55fe
|
chore: format
|
2025-05-07 02:01:03 +04:00 |
|
Tim Jaeryang Baek
|
ea07e242f5
|
Merge pull request #13528 from Classic298/dev
feat: Enhance YouTube Transcription Loader for multi-language support
|
2025-05-07 00:44:45 +04:00 |
|
Classic298
|
1dcbec71ec
|
Update youtube.py
|
2025-05-06 17:14:00 +02:00 |
|
Classic298
|
87dcbd198c
|
Update youtube.py
|
2025-05-06 17:11:03 +02:00 |
|
Classic298
|
d7927506f1
|
Update youtube.py
|
2025-05-06 17:06:21 +02:00 |
|
Classic298
|
f65dc715f9
|
Update youtube.py
|
2025-05-06 16:30:18 +02:00 |
|
Classic298
|
c69278c13c
|
Update youtube.py
|
2025-05-06 16:24:27 +02:00 |
|
Classic298
|
a129e0954e
|
Update youtube.py
|
2025-05-06 16:22:40 +02:00 |
|
Classic298
|
5e1cb76b93
|
Update youtube.py
|
2025-05-06 16:16:58 +02:00 |
|
Timothy Jaeryang Baek
|
e63b8b3879
|
refac
|
2025-05-06 00:46:32 +04:00 |
|
Timothy Jaeryang Baek
|
27da31dc83
|
fix: tikaloader extract images
|
2025-05-05 23:40:34 +04:00 |
|
Classic298
|
67a612fe24
|
Update youtube.py
|
2025-05-05 20:40:48 +02:00 |
|
Classic298
|
791dd24ace
|
Update youtube.py
|
2025-05-05 20:08:25 +02:00 |
|
Classic298
|
9cf3381381
|
Update youtube.py
|
2025-05-05 20:07:52 +02:00 |
|
Classic298
|
b0d74a59f1
|
Update youtube.py
|
2025-05-05 20:07:37 +02:00 |
|
Classic298
|
1a30b3746e
|
Update youtube.py
|
2025-05-05 20:03:00 +02:00 |
|
Classic298
|
0a3817ed86
|
Update youtube.py
|
2025-05-05 20:00:10 +02:00 |
|
Classic298
|
0a845db8ec
|
Update youtube.py
|
2025-05-05 19:57:21 +02:00 |
|