Sign In

Remember Me

Quantified Self How-To: Designing Self-Experiments

[Editor's note: Many readers of H+ are engaged in various types of self experimentation. Unfortunately many people doing these self experiments do not properly structure, log, or report their experimental results. This limits the usefulness of self experimentation to the wider Transhumanist community. Certain forms of self experimentation can be risky and this is not an endorsement of any specific self experiment. However, if you are going to experiment on yourself it makes sense to do it correctly.

Here is a beginner's guide to experimental design  specifically tailored for self experimentation to get you started.  This article was previously published as a series of blog postings by Konstantin Augemberg of Measured Me blog. Konstantin Augemberg of Measured Me blog has a mission: to prove that any aspect of everyday life could be quantified and tracked on a regular basis. He is also trying to test empirically if person's life can be optimized by identifying and manipulating internal and external variables that influence our everyday behavior, mind and psyche. Follow his personal self-quantification and self-optimization experiment at]

Sometimes, you would like to know if a certain variable X in your life affects another variable, so you could use it for your advantage. Perhaps, you are curious if meditating in the evening helps to minimize stress on the following day. Or if that supplement you have been taking really helps you loose weight faster. The problem with turning to scientific studies and online reviews is that what worked for others may not necessarily work for you. The best way to find out if the cause-effect relationship between X and Y exists for you is to conduct your own personal, Quantified-Self experiment.

The first step, would be to formulate the research hypothesis.  What is it exactly that you are trying to achieve? It should be precise and to the point, something like this: “if I do X then there will be change in Y”. If I meditate in the evening, my stress levels will be lower on the following day. If I take supplement Z, then my weight loss rate will increase”.

The second step is operationalization and selection of proper instruments and measurement procedures. Will your measurement rely on subjective tools (e.g., psychological questionnaires), more objective (e.g.,  electronic or balance scale, apps that measure reaction or pulse, gadgets that measure sleep quality, etc.), or both? If you rely on subjective measures, do some research to see if there are any established and scientifically sound questionnaires already exist out there.

You will also need to establish a timeframe. Unlike regular experiments which involve multiple participants, self-experiments have only one subject – you. Hence, in order to accumulate enough data points, self-experiments may need to last at least several days. Ultimately, it all depends on how quickly Y changes: for instance, you may capture changes in mood and sleep quality sooner than the changes in weight and body fat.

Self-Tracking Basics: Operationalization

One of the major challenges you face before you even start self-tracking is defining in plain terms how to measure the variables of interest. In experimental social sciences, there is a special term for that: operationalization. It refers to defining the variable of interest via empirical (measurable) qualities. In other words, you define it in terms of specific operations by which you will be measuring it, and then stick to that operational routine. For instance, you would like to track quality of sleep. Now, you can measure it in many different ways.

You can ask yourself upon awakening to rate your sleep on a five-point scale, from “not good at all” to “excellent”. You can also count how many times you woke up during the night. Or you can record how much time does it take for you to get out of bed after you woke up. If you prefer more objective measures, you may choose to rely instead on devices like mobile phone (Sleep Cycles or Smart Alarm clock apps) or more advanced device (BodyMedia, Zeo, FitBit) to collect certain sleep statistics (e.g., length of REM sleep).My word of advice: instead of relying on intuition and personal understanding of subject, turn to professionals and academics, and conduct some diligent research (e.g., using Google Scholar) before you start logging data. Trust me, no matter what you are trying to measure, the chances are that someone out there has already done some empirical research on the subject. Relying on solid scientific research is especially important if you are trying to measure a psychological trait or other intangible construct. You want to have a valid and reliable questions to measure the variable of interest, questions that have been successfully tested already on many people. Otherwise you risk to end up with messy, unreliable data that won’t provide you with any meaningful insights.

Self-Tracking Basics: Minimizing Measurement Error

One of the first “reality checks” that you have to accept when starting a self-tracking project is the existence of measurement error. No matter how technologically advanced the measurement and recording tools you are using, these instruments will never be able to capture the “true value” of the object or trait that you attempt to measure. Take, for instance, body weight. If you weigh yourself several times throughout the day, and then chart the results, you will notice considerable fluctuation. Your weight will naturally go up right after breakfast and even more after lunch, and your gym scales will most likely show different results than your scales at home. These differences are caused by so called “systematic” error.

This is an error that occurs due to the misuse of instruments and changes in measurement procedure. In the above mentioned example, you weighed yourself in different time of the day, using different scales, etc. Every time you introduce a change in the measurement procedure, you affect the results.Luckily, systematic errors can almost often be eliminated or minimized by means of standardization of the measurement process. In other words, you fight systematic error by introducing the system: find the optimal conditions under which the measurement is assumed to be most accurate, and from now on stick to the same routine.Another type of error that will always follow you on your self-tracking endeavors is the “random” error. This is a “noise” that occurs due to uncontrollable factors and nature of the object that you are trying to measure. I would say that the more intangible is the object, the higher is random error. For instance, latent characteristics such as emotions, mood and other psychological traits are the most difficult to capture. In this case, taking multiple measurements (or better, multiple instruments) and averaging out results could help to minimize the error.Finally, remember: in self-tracking projects, it is almost always about relative comparisons. In other words, it is not the absolute value of whatever you measure that matters, but the relative change in its values across time periods or treatments. If you keep measurement routine the same, then you will still be able to capture the change. So even if your bathroom scale constantly underestimates your weight by 1 lbs, you will still be able to see your progress, as long as you continue using the same scale.

Self Tracking Basics: Experimental Design

Finally, you need to choose a design for your experiment. Here are the three most common experimental designs for self experimentation:


(A-B) design

Designing AB self-experimentThe simplest of all, this design is also considered to be the “weakest” when it comes to capturing causality. The A and B are experimental phases. Phase A is a “baseline”, during which you measure Y variable under normal conditions. For instance, you spend a week or two measuring every day your weight, sleep quality, stress levels, etc. Phase B is a “treatment” phase, during which you introduce your variable X: you meditate every evening, take weight loss supplement, etc, and continue measuring Y. At the end, you compare levels of Y during both phases: did you lose more weight during the phase B? were your stress levels lower during the phase B? Based on the findings, you make conclusions about the effectiveness of the treatment.


(A-B-A) design

Designing ABA self-experimentThis design is considered to be more advanced, as it let’s you capture the changes in Y before and after the treatment. For instance, you spend the first week going to bed normally, the second week meditating before you to go sleep, and on the third week you going back to regular regimen. Looking at the changes in your stress levels across the three weeks (if there are any) helps you to conclude if the meditation works and how long does its effects last.



(A-B-A-B) design

Designing ABAB self-experimentThis design may be useful if you want to see if the intensity of treatment B is associated with intensity of the outcome. For instance, whether spending more time meditating will help you to reduce stress even more. So during the fourth phase, you may want to meditate more than once a day, or increase time of your evening meditation twofold. You then compare the changes in stress levels following both treatment phases to make the conclusion. In another version of A-B-A-B design, in either of the treatment phases, the treatment is replaced with placebo (e.g., similarly looking harmless pills instead of weight supplement; of course, during both treatment phases you don’t know which supplement you are taking). The longer versions of this design, A-B-A-B-A-B and A-B1-A-B2-A-B3 (where B1, B2 and B3 are different versions of treatment, e.g., different types of meditation) are also used commonly.

Now that you have chosen the instruments and design for your experiment, you can develop a protocol: detailed instructions on how and when to administer treatments and take measurements. In 90% of the cases, success of experiment depends on sticking to the protocol: dedication and consistency are the key. And remember: negative results are results, too. At least you will know that that super-duper X everyone is swearing by does not work for your Y. Which means that the search for that perfect X of your own continues!

Personal Analytics 101: Testing Differences in Your Data

Whether you are conducting a self-experiment, or tracking some variables in your life simply out of curiosity, eventually you would want to look at the data and examine it for some meaningful patterns. One of the most common research questions is testing differences: you would like to see if a given variable differs with respect to a certain “grouping” aspect, and whether these differences are statistically significant. For instance, you may want to see if you sleep better on the nights after gym workout, or if a certain diet helps you to lose more weight. In this post, I will provide simple step-by-step instructions for conducting difference test that can be easily done in Excel. I promise to keep demonstration basic and as less technical as possible!

Personal analytics 101 testing differences step0
For illustrative purposes, I will be using actual data from my current personal tracking project, #fitsperiment. For the past 2 weeks, my fitness regimen included days on which I would bike to and from work, and take a walk after lunch, and days on which I would have a quick morning workout, and then go to the gym during lunch and after work. Based on what my BodyMedia dashboard has been showing so far, I suspect that I burn more calories on “bike & walk” days than on the “gym” days. This is how my data looks (click on image to enlarge).The average number of calories burned on “gym” days is 3043, versus 3461 burned on average on “bike & walk” days. It looks like I do burn whopping 400 more calories on “bike & walk” days!  But is this difference statistically significant? Now, if you remember the Intro Stats test, you will probably suggest the good old t-test, which is readily available in Excel. To which I will respond: bad idea. You see, Quantified-Self data are somewhat different from what you may have encountered in Psychology 101 course:

In order to check for statistical significance, you will need a more robust, non-parametric test, appropriate for single-subject experimental data with small sample sizes. I recommend using non-parametric version of Hedge’s g test, applied to the ranks instead of the actual values  (if you like stats and need more technical details, see the end of this post). To calculate g and check if it is statistically significant, follow these steps:

Preparing Essential Components of the Formula.
Step 1a. Combine both groups into one, and calculate ranks for caloric expenditure values using Excel’s RANK function (make sure to rank values in the ascending order). Then split the ranks by groups (click on the image to enlarge):
Personal analytics 101 testing differences step1

Step 1b. Add numbers of data points, means (using AVERAGE function in Excel), and standard deviations SD (using STDEV function):

Personal analytics 101 testing differences step2

Computing Hedge’s g.
Step 2a. Using cells, highlighted in light blue and the following formula, compute SD Pooled value:
Measuredme testing differences pooled variance

In my example, standard deviations (SDE)and (SDC) are located in cells I16 and J16, respectively. The corresponding (nE)and (nC) are in cells I14 and J14. I ended up with Pooled SD of approximately 1.80.

Step 2b. Using the cells highlighted in light red, with the means (ME)and (MC)in cells I15 and J15, and N in cell G14, you can now calculate value of g:

Measuredme testing differences g formula

My g was approximately 1.76 (if you end up with negative sign, ignore it. It only shows direction of the difference, which we already know).

Computing confidence interval.
Step3a. Using the CONFIDENCE function in Excel, compute the following number: = CONFIDENCE(.05,<insert your Pooled SD here>, insert total N here). In my case, this value was 1.18.

Step 3b. Compute the lower and upper bounds of confidence interval, by subtracting and adding the g and the number above:

lower bound = 1.76 – 1.18 = 0.58
upper bound = 1.76 + 1.18 = 2.94

The 95% confidence thus is (0.58;2.94). It tells me that if I replicate my experiment again and again, in 95% cases the number of calories burned, when expressed as a g, will be between 0.58 and 2.94:

Personal analytics 101 testing differences step3

The good news is that the zero 0 (no differences) is not in this interval, which means that in 95% I will always burn more calories on “bike & walk” days than on “gym” days. In other words, the difference is statistically significant. Which means I will have to increase the intensity of my gym workouts

PS.  If you are into stats, here are some references behind this post:

The standardized means difference as a best choice for reporting effect size in single subject studies was suggested by Olive M.L., Smith B.W. , Effect size calculations and single subject designs. Educational Psychology 2005; 25:313-324.

Applying Cohen’s d formula to rank-transformed data for robustness was described in Schacht, A., Bogaerts K., Bluhmki E, Lesaffre E, A New Nonparametric Approach for Baseline Covariate Adjustment for Two-Group Comparative Studies. Volume 64, Issue 4, pages 1110-1116, December 2008.

Finally, screenshots of formulas for Hedge’s adjustment of Cohen’s d formula were shamelessly lifted from Joseph A. Durlak, How to Select, Calculate, and Interpret Effect Sizes, Journal of Pediatric Psychology, 34,9, pages 917-928 (link here) .


See also:

The Scientific Method

Design of Experiments

Keeping a Scientific Logbook

Single Subject Design