james mckay dot net
because there are few things that are less logical than business logic

Finding bugs in your code quickly using git bisect

git bisect is one of my favourite features of Git. It is a binary search tool that lets you quickly track down the revision that introduced a bug. Surprisingly, it doesn’t seem to be all that well known, so I thought it would be worth writing a refresher on what it is and how to use it.

git bisect: an introduction

The idea is very simple. If you know that your latest revision has a bug that wasn’t there a few weeks ago, and you can find a “known good” revision from round about that time, you can conduct a binary search of the revisions in between to find out which one introduced it.

So let’s say that you have 500 revisions to start off with. You’d mark the latest one as bad, then test, say, the 100th revision, find that it works as expected, and mark that as your last known good revision. Git will then automatically update to the 300th revision (halfway in between) for you to test. Mark as good or bad as appropriate, lather, rinse and repeat until you’re done.

Each test halves the range of revisions left to be tested, quickly narrowing the gap. In total, you have to test just \mathcal{O}(\log_2 n) revisions. This means that 1,000 revisions would only take one more test than 500, and one million would only take one more test than 500,000 and ten more tests than a thousand. Once you’ve found the offending change, you can very easily zoom right in on the problematic lines of code, rather than having to spend ages stepping through it all in the debugger.

How to use it

Before you start your bisect session, save your work using git commit or git stash. Then to start off your bisect session, type:

$ git bisect start

Next you need to tell Git the range to start off with. If your current HEAD revision is the bad one, you can just mark it as bad as follows:

$ git bisect bad

Next check out a revision that you know to be good and tell Git that it is a good one:

$ git checkout KNOWN_GOOD_REVISION
$ git bisect good

Git will now move to a revision halfway in between the two, choosing the next revision for us to test. You will see something like this:

Bisecting: 31 revisions left to test after this (roughly 5 steps)
[89f7bc018b5fc34c01bea545e3641ee2c77241ac] Bump version

Recompile and re-test your code at this revision. Look for the specific bug that you are trying to track down (ignore any other bugs for the time being) and mark as either bad or good as required:

$ git bisect bad
$ git bisect good

After each of these steps, Git will choose another revision halfway in between, until you end up with the revision that introduced the bug:

$ git bisect bad
164f5061d3f54ab5cba9d5d14ac04c71d4690a71 is the first bad commit
commit 164f5061d3f54ab5cba9d5d14ac04c71d4690a71
Author: James McKay <code@jamesmckay.net>
Date:   Sun Nov 11 14:18:44 2018 +0000

    Move some test fixtures about for consistency.

:040000 040000 d8dc665d03d1e9b37c5ee2dcde8acc032e306de8 0077c62618b69a20e5dbf6a61b42701a3ba2c156 Msrc

Once you’ve found the offending commit, reset to go back to where you started:

$ git bisect reset

Some useful tips

Use git bisect log to see a list of all the revisions you’ve checked so far:

$ git bisect log
git bisect start
# bad: [e38970b3100deecfdbc0ec183c527b49a6e68157] Don't auto-register types by default. Resolves #27.
git bisect bad e38970b3100deecfdbc0ec183c527b49a6e68157
# good: [dcb6a346e9130e736f45f65761ee57fd337483d7] Bit of tidying up.
git bisect good dcb6a346e9130e736f45f65761ee57fd337483d7
# good: [89f7bc018b5fc34c01bea545e3641ee2c77241ac] Bump version
git bisect good 89f7bc018b5fc34c01bea545e3641ee2c77241ac
# bad: [c08ed22ef9ac9cc66c56562b01143333fd61beae] Builders for conventions by name and by scan.
git bisect bad c08ed22ef9ac9cc66c56562b01143333fd61beae
# bad: [3fbc17dc37c35f963c5cea22814408ceac61787f] Bump version: release 0.2.0.
git bisect bad 3fbc17dc37c35f963c5cea22814408ceac61787f
# good: [e60f5d82b16e7b6ae739fa21cb1fc6c224d11c1a] Add link to documentation
git bisect good e60f5d82b16e7b6ae739fa21cb1fc6c224d11c1a
# good: [052e765169b71e691c70b7f458593f5552c75d41] Add resolution for arrays.
git bisect good 052e765169b71e691c70b7f458593f5552c75d41
# bad: [164f5061d3f54ab5cba9d5d14ac04c71d4690a71] Move some test fixtures about for consistency.
git bisect bad 164f5061d3f54ab5cba9d5d14ac04c71d4690a71
# first bad commit: [164f5061d3f54ab5cba9d5d14ac04c71d4690a71] Move some test fixtures about for consistency.

Use git bisect visualize to show your bisect progress in a GUI tool:

$ git bisect visualize

If you can’t tell whether a revision is bad or good (for example, because it won’t compile), use git bisect skip:

$ git bisect skip

On a final note, you don’t need to worry if you haven’t been meticulous about using git rebase to keep your source history linear. git bisect is smart enough to handle branches.

All in all, git bisect is a really useful tool. It allows you to zoom in on bugs in your source code very quickly even in large repositories with extensive histories. Using it is a skill that I would heartily recommend for every developer and tester’s toolbox.

On the “reproducibility crisis” in science

I’ve had two or three people tell me about the “reproducibility crisis” in science in the past few months. The most recent such comment was at the weekend, which coincidentally came right at the time when a 2016 Nature article on the subject was at the top of Hacker News. Here are some thoughts on the matter.

First of all, I’d like to make it clear that the reproducibility crisis doesn’t call the entire scientific method into question right across the board. There may be a lot of papers published in the scientific literature that can’t be replicated, but there are also vast swathes of others that can and are — often by multiple independent methods. The fact that some studies can’t be reproduced says nothing whatsoever about the validity of the ones that can, and it’s the ones that can that go on to establish the scientific consensus and make their way into school and university textbooks.

In fact, it’s only to be expected that the scientific literature would contain a sizeable proportion — perhaps even a majority — of non-reproducible studies. Scientists are only human, and if they rarely if ever made any mistakes, then that would suggest there was some form of underhanded collusion going on. It’s all too easy for them to inadvertently end up making mistakes, taking shortcuts, or writing down lab notes that don’t accurately describe exactly what they did. But that is why science demands reproducibility in the first place — to filter out problems such as these.

It’s important to realise that the reproducibility crisis only really affects the very frontiers of science — cutting edge research where the practices and protocols are often still being developed. There will always be a certain amount of churn in areas such as these. It rarely if ever affects more well established results, and it’s not even remotely realistic to expect it to cast any doubt on the core fundamentals. We can be absolutely confident that subjects such as relativity, quantum mechanics, Maxwell’s Equations, thermodynamics, the Periodic Table, evolution, radiometric dating, Big Bang cosmology and so on are here to stay.

Furthermore, scientists are actively working on ways to improve things. There is a whole scientific discipline called “meta-science,” which is devoted to increasing quality while reducing waste in scientific research. That is why scientists have adopted techniques such as peer review, blind studies, statistical methods to detect fraud (using techniques such as Benford’s Law) and the like. One recent innovation has been pre-registration of clinical trials as a means to combat publication bias and selective reporting: in many cases, the studies are peer reviewed before the results are taken rather than after the fact.

Interestingly, the disciplines that are most profoundly affected by the “reproducibility crisis” are the social sciences — sociology, psychology, medicine, and so on. These are subjects which first and foremost concern the vagaries of humans and other living beings, which deal with very imprecise data sets with wide spreads of results, and which predominantly rely on statistics and correlations that are much more open to interpretation and studies that are qualitative rather than quantitative in nature. It is less of a problem for the more exact sciences, such as physics, chemistry, mathematics, geology, astronomy, or computer science.

The thing about science is that its foundations of testability and rigorous fact-checking tend to bring it into direct conflict with dishonest people, hidden agendas, and vested commercial or political interests. Consequently there is no shortage of people who will do whatever they can to try and undermine public trust in the scientific community and even the scientific method itself. One of the ways that they do so is to take real or perceived imperfections and shortcomings in science, blow them out of all proportion, and make them appear far more significant and far more damaging to the legitimacy of scientific scrutiny than they really are. But that’s just dishonest. Science may not be perfect, and non-reproducible papers may be plentiful, but nobody gets a free pass to reject anything and everything about science that they don’t like.

Featured image: United States Air Force Academy

How not to stop Brexit

For better or for worse, the Conservatives under Boris Johnson have won the General Election with a majority of either 78 or 80, depending on which way the result in St Ives turns out. This means that, for better or for worse, Brexit is definitely going ahead, and there will not be a second referendum.

I personally voted Remain in 2016. Leaving the EU didn’t make much sense to me from either an economic or a logistical perspective, and I was particularly unimpressed with the arguments I was seeing from the “Leave” side, many of which seemed anti-intellectual, tin-foil hat conspiratorial, or simply not true. And I’ve never been impressed with the incessant references to the referendum result as “The Will Of The People.” The 48.1% of us who voted Remain are people too.

But Brexiteers have one legitimate concern that I have to agree with. The EU has a problem with taking “no” for an answer.

I’ve seen this playing out time and time again for over a quarter of a century. We saw it, for example, with the Maastricht Treaty and with the Lisbon Treaty (which was just a rebranding of the EU Constitution). Whenever an EU member state has a referendum that gives a result that Brussels doesn’t like, they simply make them vote again until they come up with the “right” result.

This isn’t democracy: it’s democracy theatre. It’s a complete sham, and if truth be told it makes the idea of a so-called “People’s Vote” seem really, really creepy, because it would just be more of the same. It’s a toxic, anti-democratic practice that needs to be broken.

Nevertheless, the 2016 referendum could potentially have been undone if only Remainers had gone about it the right way. If the UK were to leave the EU wth some kind of interim arrangement in place, and then have a “rejoin” referendum some months later, that would respect the mandate from 2016, avoid the mathematical problems with having three options on the ballot paper (deal/no deal/remain) rather than two, and generally have a much more credible claim towards being truly democratic. It would be clean, fair and above board.

Unfortunately, no political party proposed this option. Instead, far too many politicians did everything that they could to try to undermine and frustrate the referendum result before it could be carried out. In fighting tooth and nail for approaches that were not democratically credible, Remainers failed to come up with one that was. And in so doing, they made the whole process far, far, far more chaotic, stressful and acrimonious than it could otherwise have been.

Featured image credit: Tim Reckmann

Sorry, but I won’t watch your video

From time to time, when I’m discussing or debating something online, people send me links to videos — usually on YouTube — that they expect me to watch in support of whatever point they’re arguing.

Nowadays, I usually decline. I’m always open to a well-reasoned argument, even if I disagree with it. But it needs to be presented in a format where I can engage with it properly, fact-check it easily, and make sure I have understood it correctly. The video format doesn’t do that, and in fact more often than not it gets in the way.

  • Videos are inefficient. I can read far more quickly than I can watch a video. When I am reading, I can also skip over content that is already familiar to me, or that isn’t relevant to the topic at hand.
  • Videos are not searchable. With written material, especially online, I can quickly copy and paste words or phrases into Google to fact-check it, or into a forum post to reply to you or ask about it elsewhere. I can’t easily do this with videos.
  • Videos spoon-feed you. When reading, I can step back and ask questions. If there’s something I haven’t understood, I can re-read it several times to make sure that I get it. By contrast, with videos, the videographer sets the pace, and you have to fight against that if you want to do any critical thinking. Sure, you can pause and rewind, but doing so is much more inefficient and imprecise than with written text.
  • Videos are soporific. I’ve lost count of the number of times that I’ve momentarily fallen asleep watching a video and had to rewind it because I’ve missed an important point. Or gotten distracted onto something else and lost track of what was being said. By contrast, when I’m reading, my mind is totally focused on the text.
  • Videos are often far too long. Sorry, but if your video is an hour long, then I can tell from that fact alone that either it is a Gish Gallop, or it takes far too long to get to the point, or it is trying to tackle a subject that is too complicated to address properly in video format anyway.

Videos have their place, and the points that they make may well be valid and correct. But they are best suited for entertainment or inspiration. They are less effective for education or information, and are simply not appropriate for online debate and discussion. If someone asks you to watch a video, ask them to provide you with a text-based alternative — a web page, a PDF or a PowerPoint presentation — instead. If they really don’t have any alternative other than a video, ask them to summarise it and provide timestamps. Your time is valuable. Don’t let other people dictate how you spend it.

Featured image credit: Vidmir Raic from Pixabay

The vagaries of humans and other living beings

The title of this post is a quote from my school report when I was thirteen years old. My headmaster wrote about me, “His mind is better attuned to exact subjects such as Maths and Physics than to those concerning the vagaries of humans and other living beings.”

It was a fair point. I was a pretty geeky kid when I was at school. I excelled in subjects such as maths and physics, I did reasonably well at most other academic subjects — and I was utterly hopeless on the rugby pitch. But his comment highlighted something that’s worth bearing in mind whenever discussing subjects such as science and technology. There are two kinds of subjects that we get taught in school or at university, and that we deal with in the workplace. On the one hand, there are exact subjects, such as maths, physics, chemistry, geology, electronics, computing, and the like, while on the other hand, there are those that deal with the vagaries of humans and other living beings. And the two require completely different mindsets.

It’s a difference that I’ve felt keenly since I reactivated my Facebook account back in June after a two and a half year break. About a couple of months in, I wrote a post that simply said this:

Passion is not a substitute for competence.

This statement would be totally uncontroversial if I posted it on one of our Slack channels at work. When you’re working with exact subjects such as science or technology, you simply can’t afford to let passion become a substitute for competence. I’ve seen projects have to be rewritten from scratch and tech companies fail altogether because they made that mistake, especially about ten years ago when the whole “passionate programmer” hype was at its height.

But many of my friends on Facebook are pastors. Their entire vocations are built around dealing with the vagaries of humans and other living beings. Competence to people such as them may still be necessary, but the relative importance that they can (and should) place on passion of one form or another is much, much greater. To them, saying that “passion is not a substitute for competence” has completely different connotations.

Needless to say, my short, seven-word post turned out to be pretty controversial. And that controversy took me completely by surprise.

The essential difference

Exact subjects deal in hard evidence, empirical data, and systems tightly constrained by reason and logic. They leave little or no room for opinion or subjective interpretation, apart from situations where there is insufficient data to differentiate between two or more alternatives. The arts and humanities, on the other hand, are much more open to interpretation, speculation, and subjective opinion. Exact subjects require precise definitions and literal thinking, often expressed through symbols and code. The arts and humanities are expressed in figures of speech, analogy, poetry, and terms that are often ambiguous and very loosely defined.

Both are equally important. But they are not interchangeable.

The mistake that all too many people make is to treat exact subjects in the way that they would treat the vagaries of humans and other living beings, or vice versa. For non-technical people, this is all that they know how to do. Learning to think in the exact, rigorous manner required by the sciences does not come easily to many people. It requires training, practice, discipline, experience, patience, and hard work. Subjects that concern the vagaries of humans and other living beings, on the other hand, only require intuition, empathy and common sense, and tend to be the “default” way of thinking for most people.

This is why pseudoscience gets so much traction. Subjects such as astrology, cryptozoology, alternative medicine, water divining or graphology have a scientific looking veneer, but rather than adopting an exact, rigorous approach, they appeal to the vagaries of analogy, hand-waving approximation, empathy and “common sense,” which yield results that are much easier for most people to relate to. Unfortunately, since they are dealing with exact, deterministic systems, this approach is inappropriate, and therefore misleading or even simply wrong.

It’s also common for non-technical people to view science as if it were a matter of subjective opinion. This is especially the case when the exact sciences produce results that they find awkward for political or economic reasons. I’ve lost count of the number of climate change sceptics who I’ve seen saying “Surely if something is science, it should allow for multiple opinions,” for example. Sorry, but it doesn’t work that way. If it did, then we could have referendums on the laws of physics. You can make all the noise you like about The Will Of The People™, but good luck trying to abolish Maxwell’s Equations or the Second Law of Thermodynamics just because 51.9% of the population voted to do so. And then who can forget this:

“The laws of mathematics are very commendable, but the only law that applies in Australia is the law of Australia.” — Malcolm Turnbull, Prime Minister of Australia.

Context switching

But if some people make the mistake of viewing exact subjects as if they were subjective, human ones, there is an equal and opposite danger for those of us whose careers and expertise fall on the “exact” side of the table: to view the vagaries of humans and other living beings as if they were deterministic systems tightly constrained by reason and logic.

When you’re giving instructions to a computer, it takes what you say at face value and does what you ask it to do. If it doesn’t “get it” the first time (your code doesn’t compile, your tests fail, or whatever) you just tweak your code, rephrase it, and repeat until you get the results you want. You can’t do that with people. They filter what you say through a layer of assumptions and preconceptions about you and through their own expertise. When I said that passion is not a substitute for competence, my pastor friends didn’t have software engineering or recruitment in mind, but activities such as street evangelism or politics.

Nor can you keep rewording and refining your attempts to communicate your intentions or understanding to other people. If they’re genuinely interested, it might help, but much of the time they’ll either miss the point of what you’re saying, or else conclude that you’re just boring or even argumentative and obnoxious, and switch off.

Herein lies another problem. For if it’s hard to learn to think in exact, rigorous terms, it’s even harder to switch context between the two. And the hardest skill of the lot is to be able to bridge the gap between them.

Yet this is the very challenge that we face in software development teams. There is no subject more geared towards exact, rigorous, pedantic thinking than computer programming. If you get things wrong, Visual Studio lets you know it in no uncertain terms — in some cases dozens of times an hour. You are subjected to a feedback loop that makes working in a physics or chemistry lab look positively lethargic by comparison. You have to worry about spelling, capitalisation, and even tabs versus spaces. Yet at the same time, you are frequently being fed requirements from non-technical stakeholders that are vague, ambiguous, incoherent, self-contradictory, or even patent nonsense. As Martin Fowler said in his book, Patterns of Enterprise Application Architecture (and as I’ve quoted in the strapline of my blog), “there are few things that are less logical than business logic.”

Be aware of what you’re dealing with.

If there’s one thing I’ve learned over the summer, it’s the need to have some empathy for how “the other side” thinks. I don’t think it’s right to expect non-geeks to develop exact, rigorous approaches to anything, but just to be aware that there are times when such approaches are needed, and not to denigrate or disparage those of us who work with them. But those of us of a more technical mindset need to be able to relate to both worlds. This being the case, the burden should be on us to bridge the gap as best we can.

Featured image: March for Science, Melbourne, April 22, 2017. Photograph by John Englart.