You are viewing the site in preview mode

Skip to main content

Table 4 Inter-annotator agreement scores per domain and data source reported in terms of average per label F1 scores, macro-averaged F1 and accuracy (and standard deviation in brackets)

From: Classifying patient and professional voice in social media health posts

  Cardio/Reddit Cardio/Twitter Skin/Reddit Skin/Twitter All
F1: Other 0.90 (0.01) 0.93 (0.03) 0.89 (0.02) 0.95 (0.03) 0.93 (0.03)
F1: Patient voice 0.96 (0.01) 0.69 (0.09) 0.97 (0.01) 0.53 (0.19) 0.93 (0.03)
F1: Professional Voice 0.85 (0.03) 0.59 (0.07) 0.18 (0.15) – (–) 0.59 (0.06)
Macro averaged F1 0.90 (0.03) 0.73 (0.06) 0.68 (0.05) 0.74 (0.11) 0.81 (0.04)
Accuracy (%) 0.94 (0.01) 0.87 (0.04) 0.95 (0.01) 0.91 (0.05) 0.92 (0.03)