You are viewing the site in preview mode

Skip to main content

Table 3 Number of tokens/unique tokens per data set and split

From: Classifying patient and professional voice in social media health posts

  Reddit Twitter Both data sources
Cardiovascular
Train 831,169/26,037 119,087/16,118 950,256/34,998
Test 211,486/13,302 30,257/6729 241,743/17,094
Skin
Train 1,159,225/29,176 98,410/13,639 1,257,635/35,854
Test 290,227/14,201 24,337/5483 314,564/16,731
Both domains
Train 1,990,394/43,390 217,497/25,441 2,207,891/57,118
Test 501,713/21,779 54,594/10,201 556,307/26,444