Mutation is not ‘random’

Mutation is not ‘random’
Ben Haley 2017
The word ‘random’ gets thrown around as if it explains things. ‘A coin flip is random’. ‘Mutation is random’. But these are woeful explanations, because random only exists over possibilities.

A coin flip comes up heads or tails. It cannot come up ‘1’, ‘empty’, or ‘gold’. Of all the things that can be, the coin flip can only be two things. And so a coin flip is not random. Rather a coin flip is randomly heads or tails.

Mutation too. Like a coin flip, DNA cannot be ‘1’, ‘empty’, or ‘gold’. Unlike a coin flip DNA is not limited to two simple outcomes. Instead, complex and unknown rules guide what’s possible. DNA is restricted to a string of As, Ts, Cs, and Gs. The length stays about the same from parent to child. Only a few As will change to Ts. Sex marries one end of a mother’s strand to the other end of the father’s. Things biologists call transposons and retroviruses jump strands from place to place.

These rules determine what mutations are possible. New ones are discovered each decade. How many undiscovered rules further structure the ‘random’ possible mutations?

And so biology does not evolve ‘randomly’, but through a well structured process which we are only beginning to uncover. Over billions of years, this process has invented a photovoltaic mat that nourishes the earth including a talkative ape who’s often guilty of saying ‘random’ as if it were an explanation - problem solved. Better when this ape focuses on the rules that guide what’s possible and all that’s left to learn.

Opinions are data too

In response to Andy Matuschak's excellent article, Exaulting data, missing meaning
There’s a tendency to conflate quantification with objectivity. For example, a test score is data, a teacher's opinion is not. But some of the most important quantifications are based on subjective opinions. For example, we vote based on our subjective opinion count up the totals and use the quantity to decide the leader.
We make decisions all the time. These force us to ‘quantify’ in the broad sense of the word. Yes or no? A binary quantification. How much? A continuous quantification.
The ultimate foolishness in data obsession is to throw out all those useful quantities that come from the black box of our consciousness and rely too heavily on automatable objective truths.

Why a great teacher fears standardized tests and how to make it better.

Sarah Hagan is a great teacher. But she fears standardized tests:
Will I be freaking out Monday morning? Yeah. Do you want to know why? I'm going to be in a room with my Algebra 1 students who are being forced to take a standardized test. A test that will tell my school district how well I did my job. A test that will label my students as smart or dumb. A test that will make me feel like a success or a failure. A test that will determine whether my students will be able to graduate with their high school diploma. A test that my kids will be stressing about because I've spent the entire year reminding them what a big deal it is. A test that many of my students are already convinced they are going to fail because they've never passed their standardized math tests before. (source)
This quote shook me. Sarah is a dedicated teacher. When she gets evaluated, the results should reflect her great performance. Her students are learning. When they get evaluated, their results should reflect their improvement. These things aren't happening. And the blame lies squarely on the evaluation.
Sarah points out several problems with the test, none of them focused on its content, all of them focused on how it makes her and her students feel. I want to hone in on one of these, the students who are made to "feel dumb".
The problem is in the report:
The report focuses on student performance compared to their peers. Say a freshman starts algebra 3 years behind their peers and they finish algebra 1 year behind their peers. By any realistic measure, they have made an amazing gain. But when they see their progress report, they won't see "Amazing Gain!", instead they will see "Below average". Again.
Imagine how frustrating this is for a student that is behind. When they work hard, they remain behind. What's the point of working hard? Imagine how deceitful this is to an advanced student. When they slack for a year they are remain advanced. Why not slack?
These reports emphasize the wrong outcome. Fortunately, there is a simple remedy. Rather than comparing a student to other students, compare them to their past. Show them how much better they are than last year. The reports should say something like "This year, Genevieve correctly solved 5 types of problems that she could not solve last year". By this measure most students will improve every year because most students learn every year. Instead of feeling dumb as ever, students will feel smarter than ever. This is better, not just because it is more compassionate, but also because it is true.

Learning the rules of image transformation

text - code - both

How a computer can learn the rules of rotation, reflection, scaling, translation, and many other transformations that images can undergo.

We recognize images despite transformations.

As your eyes move across this sentence, the image hitting your retina is constantly changing. Yet you hardly notice. One reason you do not is because your brain recognizes letters regardless of their position in your field of view.

Consider the following image. Look first at the blue dot and then at the red. Notice that the number '2' between them is recognizable regardless of your focus. This, despite the fact that the image is falling on a completely different set of neurons.

Images go through many such transformations. They reverse, rotate, scale, translate and distort in many ways we have no words for. That's not to mention all the changes in lighting that can occur. Through all of this they remain recognizable.

The number of transformations that can happen to an image is infinite, but that does not mean that all transformations are possible or probable. Many never occur in the real world and our brain cannot recognize the images after these improbable transformations.

But computers are bad at learning the rules of image transformation.

The rules of transformation are important for anyone who wants to teach a computer how to process images. The algorithms that are best at image recognition learn a representation of the world that considers many shifts of focus. These are called translations as illustrated above.

However, these algorithms do not learn that images can be translated, the way they learn to recognize digits. Instead the laws of translation are programmed into the algorithm by the researcher.

Would it be possible to have computers learn about translation without telling them explicitly? What about the myriad other transformations that are possible?

I propose that these transformations can be learned.

I propose they can, and submit the following experiment as evidence. First, I show that we can learn that flipping an image upside down is a valid transformation, but randomly rearranging the pixels is not. Then I show that examples each of the aforementioned transformations can be discovered. Finally, I wax poetic about the future of this kind of work.

note: I can't claim that this work is unique. I just hope that it is interesting.

What data are we using?

I will use MNIST data, a handy collection of handwritten digits.

plot of chunk unnamed-chunk-2

Where to focus?

We will focus on a small region of data. Specifically, the three vertical pixels highlighted in each digit below.

plot of chunk unnamed-chunk-3

What patterns are common?

Next we look at the pixel patterns across many images.

plot of chunk unnamed-chunk-4

We see that certain patterns are more common than others. For example, there are many cases where one of the three pixels is blank but only one case where this is the middle pixel.

Clearly the patterns observed are not random ones.

Upside down, the patterns have similar frequencies.

Then we flip the pixels upside down and look at the patterns.

plot of chunk unnamed-chunk-5

Notice that the patterns have roughly the same frequency as before. Flipping upside down does not substantially change the image.

But switching the first two pixels produces very different frequencies.

Finally, we make an improbable change, switching the first two pixels, while keeping the third in place.

plot of chunk unnamed-chunk-6

The patterns have a very different frequency than the prior cases. For example, the pattern where two filled in pixels surround a blank pixel is common, where it was only observed once in the previous two examples.

What's going on?

We are seeing the difference between a probable image transformation, flipping upside down, and an improbable one, only switching the first two pixels. Objects in the real world flip upside down regularly. But, unless we are stretching taffy or entertaining contortionists, it is unusual to see the middle of something switch places with its top.

After a probable transformation, the image retains the same patterns as any other image. After an improbable transformation the image contains improbable patterns.

Can we quantify the effect?

We can estimate the likelihood of each reordering given the frequencies observed in the original order. For example, if a pattern, off-on-on, occurred 20% of the time in the original image then it will most likely occur 20% of the time in a valid rearrangement of the image. Concretely we use the multinomial distribution.

order loglikelihood
1 2 3 -19.92
1 3 2 -558.85
2 1 3 -596.12
2 3 1 -597.38
3 1 2 -583.60
3 2 1 -45.92

We see quantitative evidence that reinforces our visual proof and our intuition. The reverse order, "3 2 1", is more similar to the original order, "1 2 3", than any other possible transformation.

Let us consider a wider frame of reference.

Up until now, we have considered a very limited set of transformations, each possible order of three pixels. Now let's focus on a wider region. We will continue to use the three pixels as before, but we will compare them to a wider field, the 12 surrounding pixels, highlighted below.

plot of chunk unnamed-chunk-8

What transformations are likely?

Now let us consider each set of three pixels from this wider field in each of their possible orders. As before, we will use the multinomial distribution to determine how similar each set and order is to our original three pixels.

plot of chunk unnamed-chunk-9

Here we see the sets of pixels most like our original three. The most likely set (our original three itself) occupies the top left corner and the subsequent images show other sets in order of descending likelihood. For clarity, a red dot has been put on the 'middle' pixel, where 'middle' is defined as the middle pixel in the original image.

First, note that the middle pixel always stays in the middle through all of the likely transformations. This is because likely transformations tend to maintain order (except that they may flip entirely as illustrated before).

Next, notice that we see examples of each of the likely transformations that we already know to exist.

  • Rotations are common, vertical and horizontal patterns appearing nearly equally often.
  • Reversals are similarly common, though they cannot be seen as the ends are indistinguishable.
  • Translations are ubiquitous as can be clearly seen by how the pixels shift within the region in focus from left to right, top to bottom.
  • Distortions are common, very often we see not-quite straight lines, ones bent or stretched.
  • Scaling occurs, though it is rare, only occasionally do we see pixels more than one unit apart. I haven't determined why scaling is rare in this image. Obviously in the real world scaling is very common as you can see if you press your face against the screen.
Through all of these transformations, the basic pattern holds, the middle pixel in the original image remains the middle through each likely transformation. We see no examples of taffy stretching contortionism.

What transformations are unlikely?

Next lets look at those transformations ranked least likely. This will assure you that I have not hoodwinked you by divining patterns in the results that would be seen regardless of their order.

plot of chunk unnamed-chunk-10

Here we see the misfits, the unlikely patterns. Like before, except the top left is occupied by the least likely transformation.

The thing to notice here is the breakdown of our basic pattern. What was the middle, marked in red, is no longer in the middle. Instead we see the two tails abutting one another and the red middle cast aside. It stands to reason that these transformations are as unlikely as taffy and contortionists.


We understand that images can transform, but computers, generally, are blind to this fact. Here, I have given a simple visual demonstration of how those rules of transformation can be discovered by a computer using real world datasets with little external expertise.

Certainly, huge strides in computational efficiency would be necessary to make this a practical approach to image recognition problems, and it may well be that we can describe the rules of image transformation so well that a computer need never discover them on its own.

However, it is important to realize that such discovery is possible. And this does show practical promise for several reasons:

  1. Not all transformations are easy to describe. Even this simple inquiry uncovered many likely image transformations which are difficult to describe formally. Likely images were stretched and bent in ways that are quite familiar to the eye, but difficult to describe using the geometric transformations we learned in high school. We can only explicitly teach computers those things that we can describe, for the rest, computers, like us, must learn on their own.
  2. Not all datasets are so well understood. I have focused on image and image transformation, a subject both intuitive and well studied. But there are many datasets for which we have much less intuition and understanding. Think for instance of weather systems for which we have only recently developed sophisticated datasets and modeling tools or genetic sequences which are still 95% mysterious to us. In these less-familiar domains we might find that an algorithm which discovers the rules of transformation can quickly outpace the experts who attempt the same.
  3. At some point new rules must be discovered. While we may be able to impart our computers with all the benefit of our expertise in the same way that we impart it upon children, at some point, they must reach the boundaries of what is known. If computers are to join us in the exploration of brave new domains of thought, they must become more adept thinkers. Discovering the rules of transformation as I have illustrated here is part of what makes us intelligent. It is a necessary step to producing truly intelligent machines.

Do they love to learn?

Why our standardized exams should measure student attitude.

Editorial By Ben
Great teachers inspire their students.  They show them the beauty of a subject and ignite within them a burning desire to learn.
The effect of such a teacher reaches far beyond his or her classroom. Our lives are shaped by these people because, once the will to learn is burning bright within us, it continues without the catalytic spark of its creator.
And yet, when we use standardized exams to measure our students, our classrooms, and our education system, we ignore attitude.  The teacher that inspired a lifelong passion for learning is little acknowledged for his or her labor, the fruits of which are spread across the remainder of the student's lifetime.
Why is this? Why do we ignore this fundamental outcome that aligns so closely with our ideal of great teaching?

Perhaps we cannot measure attitude?

At this point, you are probably thinking that we don't measure attitude for a good reason. Several objections come to mind:
  • Perhaps attitude cannot be measured reliably by a multiple choice survey?
  • Perhaps our intuition is wrong -- ability has little to do with attitude?
  • Perhaps measuring attitude is too difficult and time consuming?

Except that we already have.

I could address these plausible objections one by one using a combination of research and reasonable arguments. Fortunately, the Programme for International Student Assessment (PISA) has made my job easy.
In 2012, PISA administered an international Mathematics exam to half a million 15-year-olds around the world. This exam included a 30-minute survey on student attitudes. They have also been gracious enough to provide an extremely good write up of their results.
This quick assessment of attitude was highly informative.  For example, PISA measured students’ math anxiety by asking how much they agree or disagree with statements like, “I get very nervous doing mathematics problems.”  A student who was among the top 15% most anxious math students was likely to perform more than one full grade level below his or her peers with average math anxiety.  
Many attitudes had similar correlations with student success (see table 1).

Example Question
Difference in performance (grade level)
Sense of belonging
I feel like an outsider at school.
I give up easily.
Openness to problem solving
I am quick to understand things.
Perceived control of success
Whether or not I do well in mathematics is completely up to me.
Intrinsic motivation to learn
I do mathematics because I enjoy it.
Extrinsic motivation to learn
I will learn many things in mathematics that will help me get a job.
I can understanding graphs presented in newspapers.
I learn mathematics quickly.
I get very nervous doing mathematics problems.
Table 1 - How math attitudes related to math aptitude
Students taking the PISA exam were asked several attitude questions in each of the above categories. A one standard deviation difference in their responses corresponded to the following grade level change in performance. One grade level corresponds to one typical year of improvement. An example of each type of question is provided.  (summarized from data presented in the pisa report)

Perhaps aptitude already tells us everything we need to know?

Okay, so attitude can be quickly measured and it correlates to aptitude.  But if we are going to include it on our standardized exams, it must provide additional value.  Perhaps aptitude already tells us everything we need to know?

Except that attitude precedes aptitude.

Intuitively we know that changes in attitude regularly precede changes in aptitude.  A student gets inspired, works hard at math, and then becomes better at math.
At PERTS, I work with a group that delivers growth mindset interventions.  These interventions are designed to convince students that the brain is like a muscle: it grows stronger with effort.  We work hard to measure the effect of these interventions both on attitude and aptitude.  The pattern that we see is reliable.  First, students become convinced that the brain is like a muscle, and then their grades go up.  We can predict the long term success of a student earlier and better if we measure their changes in attitude.
Moreover, there is solid evidence that GPA predicts college success better than standardized exam scores (using both is best of all).  Many researchers believe that this is because GPA measures 'non-cognative factors', things like attitude and behavior. Additional evidence that our standardized exams would be more powerful instruments if they measured attitude too.
However, more work in this area is needed.  We know that attitude precedes aptitude, but we don’t know by how much or how precisely we could detect the effect.  How much better would our long term predictions of success become if we measured student attitudes?  How much better could we assess the impact of teachers and teaching methodologies?  The answers to these questions requires a study on the scale of the PISA exam; one that follows a group of students over an extended time period.

Ok, I'm convinced. What can I do to help?

PISA's work is a great start, and there are others too. But there is still a lot of work to do as a society to in order to get attitude assessment included as a basic part of our standardized education assessment.
  • Tell your friends - Our education system is ultimately beholden to us. If we as a collective think that measuring student attitude is a priority, then it will become a reality.
  • Measure your students - If you are an educator, measure your student's attitudes and use these measurements to evaluate your impact.  PERTS developed an assessment that can be done online or you can adapt the PISA pencil and paper assessment.
  • Help us quantify the effect of attitude on aptitude - If you help to administer education to large groups of students you could help the most by measuring student attitudes over time.  If someone can prove that measurable changes in attitude regularly precede and predict meaningful changes in aptitude, then the case for measuring attitude will be made much stronger.

And what will the future look like?

Imagine a world where 'teaching to the test' means inspiring your students, igniting within them a burning desire to learn that contributes to happiness and success throughout their entire lives.  This world can be ours if we can learn how to test what matters most.

Nice Virus

On the whole, viruses are actually good for their hosts.

Editorial by Ben
It is a common misconception that most biological viruses are bad for their hosts. The driving reason for this misunderstanding is that the viruses we care about are bad, Influenza, HIV, etc. These viruses hurt us and we hurt them.
But these pathogenic viruses represent a very very small minority of the viruses in the world. If you sample seawater you will find 10 x more viral particles than bacterial cells. Could all of these packets of RNA and DNA be bad?
The answer is no. What we are calling a virus would be more accurately called a message. Just like messages on the internet, a small minority are pathogenic viruses aimed to hurt their hosts. But the vast majority are welcome messages. If they were not, then a cell, just like a computer, would simply stop listening to external messages.
If the cell is listening to these messages even though some of them are fatal, they must confer some evolutionary advantage. The total advantage of the useful messages must outweigh the detriment of the pathogenic ones. What sort of advantages are these?
In essence viruses play the role of a letter in a critical message exchange system. We don't know everything that RNA and DNA do, but we do know that viruses help to spread useful bits of code between cells. This includes useful proteins, regulatory sequences, and sequences that we do not yet understand the value of.