You are viewing the site in preview mode

Skip to main content

Table 8 Result for the Reddit and Twitter models on in- and out-of-source test data sets compared to the baseline model trained on all of the data

From: Classifying patient and professional voice in social media health posts

Model(s) Other: F1 Patient voice: F1 Prof. Voice: F1 Macro F1 Acc. Test
Reddit 0.94 0.95 0.86 0.92 0.95 Reddit:
Twitter 0.74 0.69 0.00 0.47 0.71 3933
All 0.85 0.88 0.30 0.68 0.86  
Reddit 0.83 0.50 0.00 0.44 0.73 Twitter:
Twitter 0.98 0.90 0.90 0.93 0.96 1941
All 0.90 0.64 0.26 0.60 0.83  
Reddit&Twitter 0.96 0.95 0.88 0.92 0.95 All:
All 0.87 0.85 0.28 0.66 0.85 5474
  1. We also include the results for both models when tested each on in-source test data combined compared to the baseline model trained on all the data (last two rows). We report F1 scores per label, macro-average F1 and accuracy across all three label types as well as the size of the test set