The Pareto principle, or more commonly known as the 80-20 rule, is quiet useful. It states that 80% of the effects come from 20% of the causes. It can be applied to all sorts of scenarios—80% of profits of a company come from 20% of its customers, 80% of sales come from 20% of the sales staff, and, also beyond business, 80% of healthcare resources are used by 20% of the patients.
There may now be a sort of 80-20 rule for scientific data loss, if this study in Current Biology can be replicated across different areas of science. In the study, the authors looked at 516 ecology papers between 1991 and 2011. They found that as the studies got older their failure rate of acquiring its raw data increased. In the end, nearly 80% of scientific raw data seems to have been lost after 20 years of a study being published.
We must remember, however, that this is the same period which saw the highest uptake of digital technologies by the scientific community. So it may mean that the 80-20 rule of scientific data loss may just apply to this period, or some period before and after it.
With better technology, more collaboration and cheaper storage, journals and scientists are both getting better at having access to data. While most data becomes useless after a while because of the nature of research, trying to keep all raw data accessible will mean that key data gets preserved.