Article Text

Poor multi-rater reliability in TCM pattern diagnoses and variation in the use of symptoms to obtain a diagnosis
  1. Oddveig Birkeflet1,
  2. Petter Laake2,
  3. Nina K Vøllestad1
  1. 1Institute of Health and Society, University of Oslo, Oslo, Norway
  2. 2Department of Biostatistics, Institute of Basic Medical Sciences, University of Oslo, Oslo, Norway
  1. Correspondence to Oddveig Birkeflet, Institute of Health and Society, University of Oslo, N-0318 Oslo, Norway; oddveig.birkeflet{at}


Background Pattern differentiation and diagnosis are fundamental principles of Traditional Chinese Medicine (TCM). Studies have shown low inter-rater reliability in TCM pattern diagnoses. This variability may originate from both the identification and the interpretation of symptoms and signs.

Objective To examine the inter-rater reliability in TCM pattern diagnoses made in the style of Maciocia for 25 case histories by eight acupuncturists and to explore the impact of demographic factors on the diagnostic conclusion. Further, the association between the diagnosis and the presence of symptoms was examined for a single TCM diagnosis.

Methods Eight acupuncturists independently diagnosed 25 women (15 fertile, 10 infertile) based on written case histories. Descriptive statistics, logistic regression and inter-rater reliability (κ) were used.

Results Poor inter-rater reliability on TCM patterns (κ<0.20) and large variation in the number of TCM pattern diagnoses were found. Sex, duration of practice and education had a highly significant effect (p<0.001) on the use of TCM patterns and working hours had a significant effect (p=0.029). There was considerable intra- and inter-rater variation in the use of symptoms to make a diagnosis. Symptoms occurring frequently as well as infrequently were inconsistently used to diagnose Liver Qi Stagnation. The study was limited by a small sample size.

Conclusions The results showed extensive variation and poor inter-rater reliability in TCM diagnoses. Demographic variables influenced the frequency of diagnoses and symptoms were used inconsistently to set a diagnosis. The variability shown could impede individually tailored treatment.


Statistics from

Request permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.