Truth inference is the statistical process of estimating a single, reliable 'ground truth' label by aggregating multiple, potentially noisy or conflicting labels from different sources. These sources are typically human annotators (e.g., in crowdsourcing) or diverse machine learning models. The core challenge is that individual labels are often imperfect, containing errors, biases, or random noise. Truth inference algorithms, such as Dawid-Skene or Majority Vote with EM, model the reliability of each source and iteratively infer the most probable true label, improving data quality for downstream model training.
