The generation of index-based series of meteorological phenomena, derived from narrative descriptions of weather and climate in historical documentary sources, is a common method to reconstruct past climatic variability and effectively extend the instrumental record. This study is the first to explicitly examine the degree of inter-rater variability in producing such series, a potential source of bias in index-based analyses. Two teams of raters were asked to produce a five-category annual rainfall index series for the same dataset, consisting of transcribed narrative descriptions of meteorological variability for 11 "rain years"in nineteenth-century Lesotho, originally collected by Nash and Grab (2010). One group of raters (n = 71) was comprised of students studying for postgraduate qualifications in climatology or a related discipline; the second group (n = 6) consisted of professional meteorologists and historical climatologists working in southern Africa. Inter-rater reliability was high for both groups at r = 0.99 for the student raters and r = 0.94 for the professional raters, although ratings provided by the student group disproportionately averaged to the central value (0: normal/seasonal rains) where variability was high. Back calculation of intraclass correlation using the Spearman-Brown prediction formula showed that a target reliability of 0.9 (considered "excellent"in other published studies) could be obtained with as few as eight student raters and four professional raters. This number reduced to two when examining a subset of the professional group (n = 4) who had previously published historical climatology papers on southern Africa. We therefore conclude that variability between researchers should be considered minimal where index-based climate reconstructions are generated by trained historical climatologists working in groups of two or more.