Table 1 Dataset Composition
Dataset | No. of images | No. of unique pills | Imaging conditions | Train/Val/Test Split | Data leakage prevention |
|---|---|---|---|---|---|
NLM Benchmark (subset of 24,404) | 3,887 | 1,000 | Reference (standardized lighting, front/back) + Consumer (mobile, variable lighting, backgrounds) | 1,000 (Train)/— (Val)/2,887 (Test) | Near-duplicate images excluded across splits; same pill identity not shared between train and test |
Custom Real-World Dataset | 500 | 50 | Uncontrolled settings: cluttered background, occlusion, inconsistent lighting | 70% (Train)/15% (Val)/15% (Test) | Duplicate and same-identity instances excluded across splits; used exclusively for robustness validation |