π MediMaven LambdaMART Learning-to-Rank (v1.1)
Gradient-boosted decision-tree ranker that fuses lexical, semantic and structural signals into a single final relevance score for our medical RAG pipeline.
π‘ Why this model?
Algorithm | LightGBM LambdaMART (lambdarank objective) |
Features (15) | BM25 score, cosine-sim (BGE embeddings), cross-encoder score, passage length, section depth, URL authority⦠|
Training data | 200 k synthetic triplets (query, positive, negative) auto-mined from Medimaven dataset (webmd, nhs, nih) |
Metric optimised | nDCG@10 |
π Quick start
import lightgbm as lgb
import numpy as np
import json, pathlib
# 1οΈβ£ load the model
model_path = "dranreb1660/[email protected]"
booster = lgb.Booster(model_file=model_path + "/ltr_lambdamart.txt")
# 2οΈβ£ prepare a feature matrix for a single query
features = np.array([
[8.7, 0.82, 0.75, 120, 2, 0.91, ...], # candidate doc 1
[7.2, 0.67, 0.55, 300, 3, 0.80, ...], # candidate doc 2
])
scores = booster.predict(features)
# 3οΈβ£ sort passages by `scores` (higher = better)
best_idx = np.argsort(-scores)
π Validation
Metric | BM25 only | BM25 β Cross-Encoder | BM25 β LambdaMART |
---|---|---|---|
nDCG@10 | 0.38 | 0.46 | 0.55 |
Recall@20 | 0.71 | 0.81 | 0.88 |
Evaluated on 1 k manually judged medical queries (Aug 2025).
ποΈ Training recipe
num_leaves: 255
learning_rate: 0.05
n_estimators: 800
min_data_in_leaf: 20
feature_fraction: 0.9
lambda_l1: 0.0
lambda_l2: 0.1
metric: ndcg
ndcg_eval_at: 10
Hardware: 1 Γ Intel Xeon 6258R, ~4 min training time.
βοΈ Citation
@misc{medimaven2025ltr,
title = {MediMaven LambdaMART LTR},
author = {Kyei-Mensah, Bernard},
year = {2025},
howpublished = {\url{https://huggingface.co/dranreb1660/medimaven-ltr-lambdamart}}
}
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support