It is well known from studies of inter-rater reliability that assessments of writing tests vary. In order to discuss this rater variation, we depart from two research questions: 1. How can rater variation be conceived of from a professional, i.e. teacher, perspective? 2. What characterises Swedish (mother-tongue) teachers’ assessments of writing tests? The first question is addressed in a meta-study of previous research, and the second question is answered in a study of 14 Swedish teachers’ rating of texts from a national written composition test in upper secondary school. The results show that teachers in the same subject assess better, i.e. have less rater variation, than other groups. It is also clear that writing tests are notoriously difficult to rate. It is very rare that the correlation coefficients reach the desirable 0.7, a number that means that 50 % of the variance could be explained by shared norms. Another main result concerns criteria and tools for assessment. Such tools should be grounded in teachers’ professional expertise, in their expectations for different levels of performance. Our study reveals several situations where teachers’ professional expertise clashes with assessment criteria. The article concludes that valid assessments of tests that are high-stakes must handle both a technical rationality, i.e. the grading should be predictable from rater to rater, and a hermeneutic rationality, i.e. the grading must be based on teachers’ professional judgment.