Statistical Classification: Characterization and Construction of Admissible Procedures
To classify an observation, we assume that each class can be represented by a probability distribution, which might be the result of a previous estimate. An older but famous example is provided by Fisher’s classification of iris species based on length measurements of their sepals and petals with class-related distributions. With the increasing relevance of machine learning methods, classification is a current research topic. Applications include object classification in image recognition or text classification, often referring to the example of spam filters. Although classification problems arise almost everywhere in the digital world and numerous algorithmic solutions are being worked on, even elementary mathematical foundations seem to have been treated only incompletely or for special cases so far. Framing classification in terms of statistical decision theory, we consider a classification problem as a family of probability distributions (Pi : i ∈ I) with a finite class index set I being the decision space, and investigate several optimality criteria of randomised decision procedures. In this regard, we obtained the result that a generalisation of the Neyman-Pearson lemma characterises all admissible procedures, which are procedures with minimal error probabilities. In certain binary problems, this characterisation yields procedures representable by class separating nonlinear hypersurfaces. Note that hyperplanes therefore generally do not provide admissible classification, even if the training data should be linearly separable. Further, we present geometrical conditions for admissibility based on the risk set, and deduce an analytical method for determining admissible procedures, in particular those that additionally fulfil the minimax condition or conditions on upper error bounds. In addition, admissibility characterisations will also be provided for the case of composite classification and test problems. Here, we will start with classical special cases, such as the one of hypergeometric distributions in testing problems with two-sided alternatives, which are apparently not treated satisfactorily in the literature. For this purpose, the underlying models are to be examined for the property of strict total positivity.