Fleiss' Kappa is a statistical measure used to assess the inter-rater reliability of agreement among a fixed number of raters, annotators, or models when classifying items into categorical scales. It extends Cohen's Kappa to more than two raters and corrects for the level of agreement expected purely by chance. In agentic cognitive architectures, it quantifies the consistency of multiple reasoning paths or model outputs, providing a foundation for self-consistency mechanisms like ensemble averaging or majority voting.
