Data science has been described as the Fourth Paradigm. It has been suggested that no theory is needed if one has “big data” because powerful new machine learning and artificial intelligence software can make sense of the data without a preconceived scientific theory. I will attempt to show that software alone is insufficient. One needs to know what to look for, and that is difficult to do without theory. I argue that a “Fifth Paradigm” is needed to guide data science.
The theoretical framework that I have developed is the “Physics of Living Systems.” I have found that living systems are complex adaptive systems. This means that individual behaviors are 1) based on infinite variety, 2) fluctuate with “high dimensional chaos”, and 3) develop over the life course.
These properties create the statistical phenomena that we observe when we try to measure population differences in groups of living systems. High dimensional chaos produces essentially random probability distributions for individuals, although sometimes these individual propensity distributions may vary from the traditional bell curve.
We then need to understand how “selection” processes behave when measuring the cumulative distribution function (CDF) of chaotic events. The selection processes can be classed “selection without replacement,” which creates a sigmoid CDF, and “selection with replacement,” which creates an exponential CDF.