UDC 004.912
JUSTIFICATION FOR THE USE OF COHEN’S KAPPA STATISTIC
IN EXPERIMENTAL STUDIES OF NLP AND TEXT MINING
Abstract. Comparison of modern metrics for evaluating the agreement coefficients between the experimental results and expert opinion is made, and the possibility of using these metrics during experimental research in the field of automatic text processing using machine learning methods is estimated. The choice of Cohen’s kappa coefficient as a measure of expert opinion agreement in the tasks of NLP and Text Mining is justified. An example of using Cohen’s kappa coefficient for evaluating the level of agreement between the thought of an expert and the results of ML classification and measure of agreement of expert opinions in the alignment of sentences of the Kazakh–Russian parallel corpus is given. On the basis of this analysis, it is proved that the Cohen’s kappa coefficient is one of the best statistical methods for determining the level of agreement in experimental studies due to its ease of use, simplicity of calculation and high accuracy of the result.
Keywords: Text Mining, NLP, Cohen’s kappa statistic, agreement statistic, text classification with machine learning, parallel corpus.
FULL TEXT
REFERENCES
- Lindstdt R., Proksch S.-O., Slapin J.B. When experts disagree: response aggregation and its consequences in expert surveys. Political Science Research and Methods. 2020. Vol. 8, Iss. 3.
P. 580–588. https://doi.org/10.1017/psrm.2018.52.
- Cohen J. A coefficient of agreement for nominal scales. Educational and Psychological Measurement. 1960. Vol. XX, N 1.
P. 3746. https://doi.org/10.1177/001316446002000104.
- Freitag R.M.K. Kappa statistic for judgment agreement in Sociolinguistics. Revista de Estudos da Linguagem. 2019. Vol. 27, N 4.
P. 1591–1612. http://dx.doi.org/10.17851/2237-2083.0.0.1591-1612.
- Franceschini F., Maisano D. Decision concordance with incomplete expert rankings in manufacturing applications. Research in Engineering Design. 2020. Vol. 31, Iss. 4.
P. 471–490. https://doi.org/10.1007/s00163-020-00340-x.
- Mielke P.W. Jr., Berry K.J., Johnston J.E. Unweighted and weighted kappa as measures of agreement for multiple judges. International Journal of Management. Vol. 26, N 2. 2009. P. 213–223.
- Banerjee M., Capozzoli M., McSweeney L., Sinha D. Beyond Kappa: a review of interrater agreement measures. The Canadian Journal of Statistics. 2008. Vol. 27,
Iss. 1. P. 3–23. https://doi.org/10.2307/3315487.
- Gwet K.L. Handbook of inter-rater reliability. Gaithersburg: Advanced Analytics, LLC, 2014. 428 p.
- Conger A.J. Integration and generalization of kappas for multiple raters. Psychological Bulletin. 1980. Vol. 88, Iss. 2.
P. 322–328. https://doi.org/10.1037/0033-2909.88.2.322.
- Nelson K.P., Edwards D. Measures of agreement between many raters for ordinal classifications. Statistics in Medicine. 2015. Vol. 34, Iss. 23.
P. 3116–3132. https://doi.org/10.1002/sim.6546.
- Ohyama T. Statistical inference of agreement coefficient between two raters with binary outcomes. Communications in Statistics — Theory and Methods. 2020. Vol. 49, Iss. 10.
P. 2529–2539. https://doi.org/10.1080/03610926.2019.1576894.
- Fleiss J.L. Measuring nominal scale agreement among many raters. Psychological Bulletin. 1971. Vol. 76, Iss. 5.
P. 378–382. https://doi.org/10.1037/h0031619.
- Light R.J. Measures of response agreement for qualitative data: Some generalizations and alternatives. Psychological Bulletin. 1971. Vol. 76, Iss. 5.
P. 365–377. https://doi.org/10.1037/h0031643.
- Khairova N., Kolesnyk A., Mamyrbayev O., Mukhsina K. The aligned Kazakh-Russian parallel corpus focused on the criminal theme. Proc. 3rd International Conference on Computational Linguistics and Intelligent Systems (COLINS-2019) (18–19 April 2019, Kharkiv, Ukraine). Kharkiv, 2019. P. 116–125.
- Khairova N.F., Kolesnik A.S., Mamyrbaev O.Zh., Mukhsina K.Zh. Aligned Kazakh-Russian parallel corpus focused on criminal topics. Bulletin of Almaty University of Energy and Communications. 2020. N 1 (48). P. 84–92.
- Nichols T.R., Wisner P.M., Cripe G., Gulabchand L. Putting the Kappa statistic to use. The Quality Assurance Journal. 2010. Vol. 13, Iss. 3–4.
P. 57–61. https://doi.org/10.1002/qaj.481.