Bolet´ın de Estad´ıstica e Investigaci´on Operativa

Vol. 32, No. 1, Marzo 2016, pp. 5-29


What are compositional data and how should they be


Juan Jos´e Egozcue

Departamento de Ingenier´ıa Civil y Ambiental

Universidad Polit´ecnica de Catalu˜na


Vera Pawlowsky-Glahn

Departamento de Inform´atica, Matem´atica Aplicada y Estad´ıstica

Universidad de Girona



Compositions describe parts of a whole which carry relative informa-

tion. Compositional data appear in all fields of science and their analysis

requires paying attention to the appropriate sample space. The log-ratio

approach proposes the simplex, endowed with the Aitchison geometry, as

an appropriate sample space. The main characteristics of the Aitchison

geometry are presented, which open the door to compositional statistical

analysis. The main consequence is that compositions can be represented

in Cartesian coordinates by using the so called isometric log-ratio transfor-

mation. Standard statistical techniques can be used on these coordinates.

Employment-unemployment data for the period 2008-2015, distributed by

activity sectors across Comunidades Aut´onomas in Spain, provides an ex-

ample to demonstrate the exploratory capabilities of three specific tools

of compositional data analysis: the variation matrix, the compositional

biplot, and the dendrogram. An exploratory regression on time is also



Compositional data analysis, Aitchison geometry, simplex,

variation matrix, compositional biplot, balance dendrogram, ilr, clr

AMS Subject classifications:

62-07, 62-02


2016 SEIO