The Measurement Problem of Transhumanism
Max More’s original definition of Transhumanism “affirms the possibility and desirability of fundamentally improving the human condition” as a starting point.
But what is the human condition and how can we measure it? How do we know if it has been improved? This is what I will in this essay call the Measurement Problem of Transhumanism. Again, from More, transhumanists seek specifically to develop technologies which can “eliminate aging” and also “greatly enhance human intellectual, physical, and psychological capacities.”
In both instances, we confront the problem of measurement. What exactly is “aging” and how do we measure it? Is it just chronological age?
The measurement problem impacts how we go about evaluating potential enhancements both technically and also from an ethical standpoint. Who gets to define enhancement and decide how to measure it? Defining what to measure has social, political, and economic implications for individuals that are much more significant than might at first appear.
The Measurement Problem of Transhumanism
Deciding whether something is really an enhancement depends on what you measure, when, where, and how you measure it. This is The Measurement Problem of Transhumanism.
Consider any potential enhancement of a person. How can we tell if applying a specific treatment or technology is an improvement or enhancement? We can not always trust the claims of manufacturers for example. What about technologies that have unintended or undesirable side effects or risks? Some of these may be unknown initially. Are these still correctly called “enhancements”?
It seems straightforward to measure, for example, physical strength. Formally strength is defined to be the maximum force one could exert under isometric conditions.
We can however see that measuring strength is anything but simple. First, when we talk about strength, we have to be referencing one or more muscle groups because otherwise we can not talk about what are the “isometric conditions” in the definition. This varies between muscle groups. In general, muscle groups can remain isolated from one another during normal movements so measuring strength of one set might not indicate or correlate with measurements of another.
Further, strength can be impacted by various environmental or physiological effects. Many factors from the air temperature to the number of hours of sleep can impact muscle performance for example. We also have to get a bit more specific about what we mean, because commonly the term strength is taken to mean absolute strength in one lift and is measured by maximum weight lifted, say in a dead lift.
However, consider, who is really “stronger”, a person that can lift 100 kg once or a person that can lift 50 kg twenty times? What about someone that can lift 50 kg twenty times in half the time? The definition of strength as maximum amount lifted just once determines the answer to the question but it might not be representative of real world performance on any interesting task.
Another situation where the problem of measurement is encountered is in viewing transhumanism as a strategy for design. When designing a product or system we commonly examine alternative approaches and possibilities. Evaluating designs in terms of characteristics such as efficiency, opacity, cohesiveness, and so on. We would like to consider various measurements of a proposed design and evaluate them in terms of transhumanist objectives such as extending and enhancing life.
Deciding What to Measure
The Measurement Problem of Transhumanism is the direct result of the complexity of the human system. In the human performance literature there are over 500 distinct measures of human performance used to evaluate the functioning of the brain, movements of the limbs, and many other physiological functions. But when we talk about enhancement, in general, we talk only about one or at most a modest bundle of these available measures.
But science tells us that a living human being is the result of complex chains of billions of molecular interactions over time. These molecular interaction networks comprise our entire being and are the building blocks of what we normatively call both our body and mind. In general we don’t and can not measure these processes directly but instead we use measures of larger system wide characteristics such as “temperature”, “strength” or “intelligence”.
In evaluating a given intervention, an experimenter will choose some measure of performance, introduce a treatment, and then report results. In well structured experiments a control group may also be included. However notice that the choice of measured quantity is often left unexplained and entirely up to the experimenter.
Consider vision performance. We often hear about someone having “20/20” vision. But in reality measuring vision is a complex field and there are dozens of measures one might consider. For example, do we ask the subject to self report what they can or can not see or do we measure electrical activity in the optical neural system? Vision is impacted by environmental factors and notably the existing level of light and contrast in the visual field can radically alter visual performance.
Some measures may exaggerate or reduce an effect. That is, an experimenter might measure a characteristic that misrepresents the effect of a treatment on other measures. An example of this sort of problem is observed with the well known tDCS technology where reports of “on task” performance improvement were later tempered by a study of “off task” performance which showed tDCS interfering with the subjects’ abilities.
Constructing fair measures of performance is a challenging area. In general performance in an experiment depends on the normative performance of the subject, the effectiveness of the treatment or intervention, and other factors. If one isn’t careful these “other factors” can dominate your results.
Even if we have a fair measure, results may vary a lot depending on circumstances. If we evaluate a proposed enhancement in an unrealistic situation our carefully measured benefits might just be irrelevant to anything that matters.
What are Measurements Anyway?
A measurement is an observation of a person using an instrument. These can vary from something like a digital thermometer to a survey given to a person to complete or fill out. In both cases we correctly can talk about these being measurements.
A measurement may not be exact and indeed most measurements we use have some amount of error or “noise” associated with them. But not everything that appears to be a measurement is one.
For example there is a method by which heart rate can be estimated from a digital image sequence. However, the resulting heart rate estimate is not a measurement. The source images are the measurements and an algorithm is used to generate an estimated heart rate.
The result of measurement depends not only on the underlying phenomenon being measured, but also on the method or apparatus we use to measure. We’ll get a different estimate of body temperature by feeling someone’s forehead and guessing, using a conventional mercury thermometer, and a modern digital one.
Measurements differ in a variety of ways but in general we can measure, compare and contrast the sensitivity and selectivity of various sensors and systems for measurement. Such comparisons are often made, for example, via the so called Receiver Operating Characteristics of two sensors.
But even using an identical measurement protocol, a badly constructed experiment can result in confused or nonsensical results. As a relevant example, in testing an enhancement to night vision performance, it is critical to carefully control subjects’ exposure to bright light and especially sunlight for as much as 48 hours prior to an experiment.
Exposure to bright sunlight also has a cumulative and adverse effect on dark adaptation. Exposure to intense sunlight for as little as two to five hours decreases visual sensitivity for up to five hours. In addition, the rate of dark adaptation and the degree of night visual acuity decrease and these cumulative effects may persist for several days.
So a measurement depends not only on the specific measurement device or protocol, but also possibly many other factors including, in some cases, what the subjects where doing prior to the experiment.
A/B Testing Nootropics and Privacy
An important case many readers will be familiar with is known as “A/B Testing”. In this case, we have two treatments and we want to decide between them. An example would be deciding between two different but similar web page designs, but perhaps more relevant here, consider comparing the benefits of two similar nootropic substances.
The problem in most A/B Testing scenarios is simply lack of available test subjects. When considering large populations, say the entire population of the United States, it is generally impossible to collect enough samples to make a statistically valid inference about the effectiveness of one treatment over another without incurring significant expense and complexity.
Simply collecting the data is problematic, and depending on the type of data collected there are also privacy and data ownership issues. While some open source efforts to collect performance data do exist, depending on the project, you may have to do it yourself.
Performance measurement and monitoring can impact employability. That is, it could if done in a sophisticated manner. Would you want to employ someone who was measured to be better performer or a worse one? So when we imagine something as simple as a comparison of nootropic performance using this sort of open database, we also immediately find ourselves in the “deep end” of personal privacy and medical data management. Who gets the data and what they do with it matters and so we have to be careful.
Generally organizations do not do proper A/B testing and fall back on “indicative” tests which do not produce statistically significant results. This is a risky practice which may be misleading; an A/B Test conducted using only the employees of a company developing the product or readily available grad students and post docs as another example. Also, with A/B testing it often isn’t possible to expose subjects to both treatments without impacting the end results. With small groups this is a problem because this means you have to divide the available subjects in half.
The Two Kinds of Experiments
There are two fundamental types of experiments commonly discussed. First, there are experiments involving populations or groups of individuals. These are described by statistical characteristics such as “average life expectancy”. Second, there are experiments conducted on non-aggregated individuals. These are essentially life stories or parts of them, which are comprised of sequences of measurement events recorded over time. An example would be the birth and death dates of a set of people.
When conducting an experiment one or the other of these types is chosen by the experimenter. That isn’t a problem in itself, however, if the experimenter makes the wrong choice or fails to understand the difference in the two types of experiments, they may reach misleading conclusions. For example, an increased life expectancy won’t always increase the lifespan of an individual in the measured population. Assuming that it does is a fallacy although in general individuals in populations with higher life expectancies do indeed also have longer lifespans. But you can not assume anything based only on the population statistic.
While individual stories can be used to construct population statistics, the reverse transformation isn’t possible in general. I can compute the average life expectancy from a population of life spans, but any method to determine a specific individual’s life span form the population average must be only an estimate or prediction. Another problem, in many cases detailed individual measurements are only available in small numbers which are not sufficient for constructing an estimate of the parameter for a larger population.
The conduct of individual or n=1 experiments is quite different than those using a sampled group from a population of individuals. When we are talking about making an estimate for a population, we have to confront the issue of sampling. How we select individuals to measure can strongly bias our estimates for the larger population. If we use a biased sample, the result will inevitably also be biased. This is a subtle and complex issue beyond the scope of this article.
In individual experiments the main issue is baselining. Comparing a treated individual to their prior performance requires that we have measured what their prior baseline performance was. But how do we define the baseline?
For example, do we measure once, right before the application of a treatment? What if the baseline performance is variable and subject to environmental factors such as time of day or air temperature?
Even deciding how to take the baseline measurement can be problematic.
Beyond the Basics: The Extended Self, Fusion, and Symbiosis
Transhumanism opens another new challenge in performance measurement — the extended self. That is, what constitutes a person may not be entirely contained within what we conventionally consider as their “body”. Consider a person with an external cloud based extended electronic hippocampus as an example.
Some of their memories are stored locally as yours and mine are today. But others are offloaded to the cloud. When the person retrieves a memory, they may not even know whether it was local or remotely stored.
Now consider also a novel memory enhancement applied to this already extended person. We can not correctly measure the enhancement only by taking measurements on or around the person’s biological body. A correct measurement protocol for an extend person must therefore include not only measurements of the biological body but other already in place enhancements.
Another area of complexity involves enhancements that are combinations or the results of “information fusion” from multiple sources. Because sources can interfere, remain uncorrelated, and noise can spread between improperly combined source estimates evaluating such architectures involves another level of complexity.
Inevitably we are also now considering connected systems which are part of the global information fabric. Beyond the Web and Internet of Things is the Internet of Everything where we and our digital extensions are all connected together seamlessly and ubiquitously. Evaluating enhancements in the connected world entails an even deeper level of investigation because we can not always restrict our analysis to any individual.
For example, our devices are manufactured using processes that produce toxic waste and pollution and which employ individuals under unacceptable inhuman conditions. An enhancement may be correctly called such only if you ignore the implications elsewhere.
Beyond this, when considering a symbiotic system, measurements of the performance of one part may not be indicative of the performance of the entire system as a whole. A man-machine system can outperform either man or machine alone in some cases. But if you don’t measure the entire system the benefits will be unmeasurable and unknown.
Does the Apple Watch make us smarter or stupider? And how would we construct a valid experiment to decide?
The Measurement Problem of Transhumanism suggests that it isn’t easy to construct these sorts of experiments. Transhumanist researchers should carefully construct their experiments and consider these issues in their designs.
What is measured and how the measurements are performed can have a large influence on scientific results, the design of products and even the implementation of government policies. As such it is a technical topic that transhumanists should become more familiar with.
Evaluating transhumanist proposals and designs hinges on measuring their benefits. But beyond this, consider the implications of measurement to ideas such as morphological freedom and cognitive liberty. What is measured matters.
What is cognitive liberty and how would you measure it? Is it just the ability to change your mind? Who gets to decide how to measure liberty?
How would one quantify, compare and contrast various possible moral enhancements? Does it matter who does the measuring?
I would say it does.
More, M. & Vita-More, N. (Eds.) The Transhumanist Reader: Classical and Contemporary Essays on the Science, Technology, and Philosophy of the Human Future. (New York: Wiley-Blackwell Publishing, 2013).
ASU Lab Invents Non-Invasive Measure of Glycogen, http://transforming-science.com/human-performance-lab-validates-non-invasive-method-to-measure-muscle-glycogen/
Bostrom, Nick, and Anders Sandberg. “Cognitive enhancement: methods, ethics, regulatory challenges.” Science and Engineering Ethics 15.3 (2009): 311-341.
Whitmire, Scott A. Object oriented design measurement. John Wiley & Sons, Inc., 1997.
No, Biohackers Did Not Just Discover Eyedrops That Give You Night Vision, https://wp.dagiopia.net/2015/03/30/no-biohackers-did-not-just-discover-eyedrops-that-give-you-night-vision-and-using-them-might-damage-your-eyesight/
Rothman, Peter L., and Richard V. Denton. “Fusion or confusion: knowledge or nonsense?.” Orlando’91, Orlando, FL. International Society for Optics and Photonics, 1991.
Rothman, Peter L., and Stephen G. Bier. “Evaluation of sensor management systems.” Aerospace and Electronics Conference, 1989. NAECON 1989., Proceedings of the IEEE 1989 National. IEEE, 1989.
Abdel-Malek, Karim, et al. “Human Performance Measures: Mathematics.”Department of Mechanical Engineering The University of Iowa, Technical report(2005): 1-27.
Sternberg, Daniel A., et al. “The largest human cognitive performance dataset reveals insights into the effects of lifestyle factors and aging.” Frontiers in human neuroscience 7 (2013).
Fregni, Felipe, et al. “Anodal transcranial direct current stimulation of prefrontal cortex enhances working memory.” Experimental brain research 166.1 (2005): 23-30.
U.S. Army Field Manual No. 3-04.301(1-301) Aeromedical Training for Flight Personnel
Wayman, James L. “Technical testing and evaluation of biometric identification devices.” Biometrics. Springer US, 1996. 345-368.
Starkey, Guy. “Estimating audiences: Sampling in television and radio audience research.” Cultural Trends 13.1 (2004): 3-25.
Dass, Sarat C., Yongfang Zhu, and Anil K. Jain. “Validating a biometric authentication system: Sample size requirements.” Pattern Analysis and Machine Intelligence, IEEE Transactions on 28.12 (2006): 1902-1319.