AoC Day 3 – Mull It Over

Day 3 of AoC this year was the first day that honestly made me work. The first part of it pretty straight forward. You’re given a very long string input (mine was just over 18,000 characters). Within that string is a wide variety of alphabetical, numerical, and special characters. You’re looking for a sequence that is “mul(#,#)” where # is an number 1 – 3 numbers long (i.e. mul(123,1) would be valid). Whenever you find that sequence you want to sum the product of those two numbers in the parenthesis.

For me, this screamed regular expression to me. I compiled for the pattern I was interested in, and then just iterated through all of the matches and got my answer very easily.

import re

pattern = re.compile('mul\\((\\d{1,3},\\d{1,3})\\)')
matches = re.findall(pattern, line)
for match in matches:
     tmp = match.split(',')
     answer += int(tmp[0]) * int(tmp[1])

Regular Expressions

It may be important to have a quick aside about regular expressions (regex) and what they are. Regex is a great tool that I absolutely love because it’s so powerful. Most people are familiar with literal expressions. Say for example I want to search for a particular address. I can go into my search engine and search for 101 Main Street and it will/won’t find the results just fine. The same is true for a phone number. If I want to search for the phone number 555-555-5555 through a disk image or whatever, I can search for that very easily.

What is much harder though would be to search for ALL phone numbers. This is where regex comes into play. Regex essentially allows us to describe what we’re looking for. Going back to our phone number example, we know that within the US at least a phone number consists of 3 digits, followed by 3 digits, followed (generally) by a “-“, followed by 4 more digits. I can describe that sequence by using a regex of \d{3}.\d{3}-?\d{4} where \d represents a digit 0-9 and {3} represents repeating that 3 times. The area code is separated by a single character of any time (the .). And the prefix and suffix is separated by an optional -. If I wanted to, I could modify slightly to accommodate the use of parenthesis around the area code by using the regex (?\d{3})?.\d{3}-?\d{4}.

Part 2

Now the 2nd part gave me significant issues. Within the input string, there is also a series of do() and don’t(). For example xmul(2,4)&mul[3,7]!^don't()_mul(5,5)+mul(32,64](mul(11,8)undo()?mul(8,5)). As you can see this is intermixed with the “mul” patterns mentioned above. The trick here is to only count the “muls” that happen between a do() (a.k.a. START) and a don’t() (a.k.a. STOP).

My first plan was to stick with regular expressions, and to capture only mul groupings that were between the start/stop points. I will tell you, I spent a lot of time trying to come up with a regular expression that would work for this. I used regex101.com for this. If you’re not familiar with it, I ABSOLUTELY love the site and use it every time I have to play with regular expressions. I love the fact that I can put in test data and live as I am writting the regex, see what it is/isn’t capturing. It makes it so much easier to troubleshoot as opposed to trying to start/stop in Python. Tried as I might (and I eventually broke down and tried using ChatGPT to help), I couldn’t come up with a regex that would only capture the things I was interested in.

I was pretty well stuck when I started to browse the AoC chat on slack at work. One of the people who gets paid to code for a living had mentioned a state machine. I was vaguely familiar with the idea, but hadn’t ever actually done one myself (turned out I had but didn’t realize what exactly it was). That got me researching a little bit and thinking to where I could finally come up with an idea.

In short, a state machine keeps track of the current state of whatever it is you’re tracking (in this case start vs stop). I ended up modifying my regex not to find just the “interesting” muls, but instead to find every instance of do(), don’t() and mul(#,#). The object that gets returned in python presents the matches in the order that they are found. This means that with a state machine, I can start caring about muls I find when I am in a start state, and ignore the ones that I find in a stop state.

pattern2 = r"(mul)\((\d{1,3}),(\d{1,3})\)|(do)\(\)|(don't)\(\)"
on = True

hits = re.findall(pattern2, line)
print(hits)
for hit in hits:
    if hit[0] == "mul" and on:
        answer += int(hit[1]) * int(hit[2])
    elif hit[4] == "don't":
        on = False
    elif hit[3] == "do":
        on = True

Basically what I am doing here is initializing my variable on to True because the instructions say to assume that you start out accepting found muls. With each match one of three things happen. If the match is a mull and my state is on = True then I do the multiplication and add the total. If the the match is don’t, then I change the state to on = False. If the match is do() then I change the state to on = True.