Day 2 - organisation is difficult.
Complete solution can be found on GitHub
Part One
We are asked to calculate a checksum for the lis of box IDs in the puzzle input.
A checksum is defined as the number of IDs containing exactly two of any letter multiplied by t the number of IDs containing exactly three of any letter.
- Parse the input to get a list of box IDs
- Maintain a counter for the total number of IDs containing exacly two of any letter and a counter for the total number of IDs containing exactly three of any letter
- For each box ID:
- Find the frequency of each letter
- Increment the appropriate counter if any letter occurs exactly two/three times
- Multiply the final values of the two counters to obtain a checksum
Parsing Input
Each line of the puzzle input represents a box ID:
pbopcmjeizuhxlqnwasfgtycdm
pjokrmjeizuhxlqnfasfguycdv
pbokrpjejkuhxlqnwasfgtycdv
sbokrmjeizuhxaqnwastgtycdv
pbokrmjeizuhxljnnasfgtycn
We could read the lines one at a time, however I’m going to read them all into a list for re-use in part two.
- This is feasible as the input file is small (6750 bytes)
with open("input.txt") as f:
box_ids = [l.strip() for l in puzzle_input_f.readlines()]
The use of str.strip
to remove trailing/leading whitespace isn’t strictly necessary as Eric Wastl has provided us tidy, clean input data. However it’s usually a good practice when dealing with external data where trailing/leading whitespace isn’t desired.
Box ID Letter Frequency
A simple approach could be to iterate over the box ID and for each character search the remaining characters for repeat occurences:
frequencies = {}
for i, ch in enumerate(box_id):
if ch not in frequencies:
frequencies[ch]
for j, ch2 in enumerate(box_id):
if i != j and ch == ch2:
frequencies[ch] += 1
This works, but we can do better! Some issues with this approach:
- Inefficient O(n2) runtime caused by the nested loop. For each letter in the box ID, it ‘looks’ at every other letter.
- The problem doesn’t require keeping track of what the letters are, only their frequencies
To improve this, we can first reduce the box ID to it’s unique letters, then count the occurences of each within the full box ID. Uniquifying a string in Python can be done by creating a set
from the string (set
will appear a lot throughout this series):
box_id = "pboxwmjeituhxlqnwasfgtycdv"
unique_letters = set(box_id)
Then we can use str.count
to retrieve the frequencies for each unique letter:
freqs = [box_id.count(l) for l in unique_letters]
Comparing performance of the two by average execution time over 700000 loops:
- Approach 1: 30.4 µs
- Approach 2: 3.54 µs
Calculating The Checksum
With the list of letter frequencies, calculating the checksum requires finding the number of times a frequency is exactly two, the number of times a frequency is exactly three and then multiplying those counts together:
def part_1(box_ids):
twos = 0
threes = 0
for _id in box_ids:
unique_letters = set(_id)
freqs = [_id.count(l) for l in unique_letters]
twos += 1 if [f for f in freqs if f == 2] else 0
threes += 1 if [f for f in freqs if f == 3] else 0
return twos * threes
For my puzzle input, this gives a checksum of 6000
Part Two
We are told there are only two correct box IDs in the list and that they are the only two that differ by exactly one character. We are asked to find the common letters between these two box IDs.
- For each box ID in the list, compare it with every other box ID as follows:
- Compare the letters at each position in the two box IDs
- If the letters differ, increment a counter
- If after comparing all the letters the counter is equal to one, return the two box IDs
- We can return early here to prevent searching the remaining box IDs as we are told in the problem that there is only one pair of box IDs matching this invariant.
To compare each possible pair of box IDs we could use a double for loop:
for id_1 in box_ids:
for id_2 in box_ids:
# compare
However, I am going to use itertools.product
to be more Pythonic:
import itertools
for id_1, id_2 in itertools.product(box_ids, box_ids):
# compare
Then to count the differences between the letters of box ID pair:
diffs = 0
for i, letters in enumerate(zip(id_1, id_2)):
if letters[0] != letters[1]:
diffs +=1
The correct box IDs can then be found by breaking out of the loop when diffs
is equal to one:
for id_1, id_2 in product(box_ids, box_ids):
diffs = 0
for i, chars in enumerate(zip(id_1, id_2)):
if chars[0] != chars[1]:
# we know these aren't the correct box IDs -> stop searching early
if diffs > 1:
break
else:
diffs += 1
if diffs == 1:
# id_1 and id_2 are correct
break
Since only one letter is different between the two correct box IDs, finding the letters that are the same can be achieved by splitting the ID’s at the position i
where the differing letter is located and joining the remaining pieces together:
id_1[i:] + id_2[i:]
Putting it all together:
def part_2(box_ids):
for id_1, id_2 in product(box_ids, box_ids):
diffs = 0
for i, chars in enumerate(zip(id_1, id_2)):
if chars[0] != chars[1]:
if diffs > 1:
break
else:
diffs += 1
if diffs == 1:
return id_1[:i] + id_2[i:]
For my puzzle input, the result is “pbykrmjmizwhxlqnasfgtycdv”.