qdrant-hybrid-search-combining

qdrant

Actualizado 5 days ago

154

Otrogeneral

Acerca de

Esta habilidad ayuda a los desarrolladores a combinar resultados de búsqueda dispersos y densos en Qdrant utilizando métodos de fusión como RRF. Aborda escenarios donde las puntuaciones no son comparables entre diferentes tipos de búsqueda o cuando se necesita fusión personalizada. La habilidad soporta estructuras de prebúsqueda tanto planas como anidadas para optimización de búsqueda híbrida.

Instalación rápida

Claude Code

Recomendado

Principal

npx skills add qdrant/skills -a claude-code

Comando PluginAlternativo

/plugin add https://github.com/qdrant/skills

Git CloneAlternativo

git clone https://github.com/qdrant/skills.git ~/.claude/skills/qdrant-hybrid-search-combining

Copia y pega este comando en Claude Code para instalar esta habilidad

Documentación

Combining Prefetch Results

The outer query fuses ranked candidate lists from all parallel prefetches into one ranked list of results. Fusion methods differ in whether they use rank, score or directly vector representations of candidates (their similarity to the outer query) and whether final score incorporates payload metadata. All methods support flat (one fusion step) and nested (multi-stage) prefetch structures.

Scores Are Not Comparable Across Prefetches & You Want Some Easy Baseline

Use when: searches produce scores on different scales, like BM25 and cosine on dense embeddings.

RRF

RRF (Reciprocal Rank Fusion) — rank-based, ignores scores magnitude, a decent default to start with.
Tune k to control rank sensitivity in RRF fusion.
Add per-prefetch weights when one search should dominate, using Weighted RRF. Weights should be customized per collection and retrievers' score distributions!

DBSF

DBSF (Distribution-Based Score Fusion) — normalizes score distributions per prefetch before fusing them, for that, instead of min-max, uses mean +- 3 deviations on prefetched list of scores. Avoid relying on resulting absolute scores, as scores in DBSF are normalized per prefetch (aka per a retrieved list of search results), and might be uncomparable across queries.

Need Custom Fusion

Use when: recency, popularity or other payload values should affect the merged ranking alongside candidate scores or you need a custom fusion.

With formula query, access score of each prefetch and, if desired, payload field values.

If you want to implement custom fusion on score of each prefetch:

Use decay or any other available expressions for normalizing score distributions before fusing them.
Parameters of these expressions should be based on the collection & retriever score distributions (for example, adjusting these parameters on a subsample of real queries).
Formula query is unable to provide ranks for custom fusions

Need Good Ranking of Fused Candidates and Ready To Spend More Resources

Use when: you want to use similarity between query and candidates' vector representations as the prefetches combiner and simultaneously ranker. More resource heavy than score/rank based fusions, but might be necessary due to use case requirements or need in a high top-K precision of results (when parallel prefetches have overall a good recall of retrieved candidates).

You can use any type of vector as an outer query over the prefetches, to perform the fusion on the server-side in one QueryAPI request: sparse, dense, multivector. For that, same type of vector representations for documents need to be stored as named vectors per point.

Instead of using client-side fusion through cross-encoders, a popular option is Late interaction models-based fusion, through reranking on multivectors (e.g. ColBERT for text, ColPali and ColQwen for images).

Most precise but highest compute/resource usage.
Configure multivectors used for fusion through reranking with HNSW disabled like in Hybrid Search with Reranking tutorial.

What NOT to Do

Use linear weighted fusion on incomparable score ranges. Why not.
Use "vibe" defined weights in weighted RRF. Weights should be fine-tuned per dataset and retrieval pipelines.
Pick any fusion type without comparative experiments.
Use late interaction multivectors for fusion without evaluating cheaper analogues, for example, MUVERA. More in multi-vector Qdrant search course

Repositorio GitHub

qdrant/skills

Ruta: skills/qdrant-search-quality/search-strategies/hybrid-search/combining-searches

agent-skillsai-agentsclaude-codecodexcursorembeddings

Habilidades relacionadas

llamaguard

Otro

LlamaGuard es el modelo de Meta de 7-8B parámetros para moderar las entradas y salidas de LLM en seis categorías de seguridad como violencia y discurso de odio. Ofrece una precisión del 94-95% y puede implementarse usando vLLM, Hugging Face o Amazon SageMaker. Utiliza esta skill para integrar fácilmente filtrado de contenido y barreras de seguridad en tus aplicaciones de IA.

Ver habilidad

cost-optimization

Otro

Esta Skill de Claude ayuda a los desarrolladores a optimizar los costes en la nube mediante el ajuste de tamaño de recursos, estrategias de etiquetado y análisis de gastos. Proporciona un marco para reducir los gastos en la nube e implementar una gobernanza de costes en AWS, Azure y GCP. Úsala cuando necesites analizar los costes de infraestructura, ajustar el tamaño de los recursos o cumplir con restricciones presupuestarias.

Ver habilidad

quantizing-models-bitsandbytes

Otro

Esta habilidad cuantiza LLMs a precisión de 8 o 4 bits utilizando bitsandbytes, logrando una reducción de memoria del 50-75% con pérdida mínima de precisión. Es ideal para ejecutar modelos más grandes en memoria GPU limitada o para acelerar la inferencia, admitiendo formatos como INT8, NF4 y FP4. La habilidad se integra con HuggingFace Transformers y permite entrenamiento QLoRA y optimizadores de 8 bits.

Ver habilidad

dispatching-parallel-agents

Otro

Esta Skill de Claude despliega múltiples agentes para investigar y solucionar 3 o más problemas independientes de forma concurrente. Está diseñada para escenarios que involucran fallos no relacionados que pueden resolverse sin estado compartido o dependencias. Su capacidad principal es la resolución paralela de problemas, asignando un agente por cada dominio problemático independiente para maximizar la eficiencia.

Ver habilidad