qdrant-hybrid-search

qdrant

업데이트됨 5 days ago

154

문서 처리wordpowerpointai

정보

이 스킬은 Qdrant에서 하이브리드 검색을 구현하는 방법을 설명하며, Query API의 `prefetch` 기능을 사용하여 키워드 검색과 의미 검색을 병렬로 실행하는 방식을 다룹니다. 개발자에게 희소 벡터와 밀집 벡터의 결과를 통합하여 키워드 일치 누락 같은 문제를 해결하는 방법을 안내합니다. 결합된 검색 방법을 설정하거나 여러 검색 표현을 처리할 때 활용하세요.

빠른 설치

Claude Code

문서

Hybrid Search in Qdrant

Hybrid search means running two or more different searches in parallel and combining their results into one.

In Qdrant this is powered by the Query API via prefetch: each prefetch runs exactly one type of search independently, and the outer query combines results from parallel prefetches.
Prefetches can be nested and searches can be multi-stage, all pipeline happening in one request through Query API. See Universal Query API for examples.

Identify the user's problem and pick building blocks:

What can go into one prefetch, e.g. power one search, in Search Types
How to combine results of these searches (RRF, DBSF, FormulaQuery, reranking) in Combining Searches

Based on what you've picked, test your approach:

Configure Qdrant collection with named vectors, where each named vector usually corresponds to one representation (different embedding models or different vector types) of a data point.
Construct a hybrid search request with Query API from your building blocks. You can search independently among one type of vectors, with prefetch + using, like shown in examples in Hybrid Queries documentation.
Evaluate hybrid search quality on real user data and provide user with improvements and tradeoffs (speed/resources).

How Isolated Are Parallel Searches?

Use when: different tenants share one collection and you need to understand hybrid search isolation guarantees.

If user wants to isolate/share hybrid search pipelines between tenants, consider that:

Indexes (sparse, payload and dense) and IDF modifier for sparse vectors are computed independently per shard, not per tenant.
Prefetch runs independently per shard to retrieve #limit results, so for collection-level prefetches if collection has several shards, Qdrant will always prefetch under the hood #limit * #shard results. Final results are merged based on scores.
In nested prefetches (deeper than 1 level), methods described in "Combining Searches" might be done on a shard level first, then per-shards results once again will be merged based on scores.

What NOT to Do

Choose a hybrid search pattern based on "vibes" without any hybrid search quality evaluation in-place.
Create too many named vectors without a need. An unfilled named vector might take as much resources as a filled one.

GitHub 저장소

qdrant/skills

경로: skills/qdrant-search-quality/search-strategies/hybrid-search

agent-skillsai-agentsclaude-codecodexcursorembeddings

연관 스킬

release-standards

문서 처리

이 스킬은 소프트웨어 릴리스에 대한 시맨틱 버저닝(semver) 가이드라인과 변경 로그 형식 표준을 제공합니다. 릴리스를 준비할 때 버전 번호(메이저/마이너/패치)를 올바르게 증가시키고 변경 로그 항목을 구성하려면 이 스킬을 사용하세요. 사전 릴리스 식별자 규칙과 개발자를 위한 명확한 예시가 포함되어 있습니다.

스킬 보기

commit-standards

문서 처리

이 스킬은 Conventional Commits 표준에 따라 Git 커밋 메시지를 형식화합니다. 커밋 작성이나 리뷰 시 일관성을 보장하기 위해 템플릿과 유형 정의(예: `feat`, `fix`, `refactor`)를 제공합니다. 커밋 과정에서 이를 사용하여 명확하고 구조화된 커밋 기록을 생성할 수 있습니다.

스킬 보기

huggingface-tokenizers

문서 처리

이 스킬은 HuggingFace의 Rust 기반 라이브러리를 사용하여 1GB 텍스트를 20초 이내에 처리하는 고성능 토크나이제이션을 제공합니다. BPE, WordPiece, Unigram 알고리즘을 지원하며 사용자 정의 토크나이저 학습과 정렬 추적 기능을 포함합니다. 프로덕션 수준의 고속 토크나이제이션이 필요하거나 transformers 생태계와 통합된 맞춤형 토크나이저를 구축할 때 사용하세요.

스킬 보기

nano-pdf

문서 처리

nano-pdf는 개발자가 특정 페이지의 텍스트 변경이나 오타 수정과 같은 자연어 지시를 사용해 PDF를 편집할 수 있는 CLI 도구입니다. 터미널에서 직접 빠르고 프로그래밍 방식으로 PDF를 수정하는 데 이상적입니다. 페이지 번호 매기기가 버전마다 다를 수 있으므로 출력 결과는 항상 확인하세요.

스킬 보기