What’s with this?

6 minute read

Published:

The Zen of Python is a well-known document in the Python community. It quite succinctly, and humorously, lays out all the guiding principles for Python’s design. The Zen of Python is never more than a few keystrokes away; it can be viewed at anytime by running the this module either via the terminal or from the Python interrupter:

Shell

python -m this

Python Interrupter

import this

This is an incredibly well-known Easter egg but even with this simple text we can uncover a wealth of interesting information about both Python and Linux. Let’s review this simple module and see what we can uncover!

First we need to find the module itself. A simple Google search and/or an educated guess will probably reveal its location, but I prefer to explore on my own. Let’s start by monitoring the system calls to see what files are opened!

strace -e trace=open,openat,close,read,write,connect,accept python3 -m this |& less

Monitoring the system calls via strace was always going to work, but the output is daunting and requires some parsing to find the answer, even after limiting it with the -e trace= option. Perhaps we can use Python directly:

import this
print(this.__file__)

Now that we have found the file we can view it and call it a day!

s = """Gur Mra bs Clguba, ol Gvz Crgref

Ornhgvshy vf orggre guna htyl.
Rkcyvpvg vf orggre guna vzcyvpvg.
Fvzcyr vf orggre guna pbzcyrk.
Pbzcyrk vf orggre guna pbzcyvpngrq.
Syng vf orggre guna arfgrq.
Fcnefr vf orggre guna qrafr.
Ernqnovyvgl pbhagf.
Fcrpvny pnfrf nera'g fcrpvny rabhtu gb oernx gur ehyrf.
Nygubhtu cenpgvpnyvgl orngf chevgl.
Reebef fubhyq arire cnff fvyragyl.
Hayrff rkcyvpvgyl fvyraprq.
Va gur snpr bs nzovthvgl, ershfr gur grzcgngvba gb thrff.
Gurer fubhyq or bar-- naq cersrenoyl bayl bar --boivbhf jnl gb qb vg.
Nygubhtu gung jnl znl abg or boivbhf ng svefg hayrff lbh'er Qhgpu.
Abj vf orggre guna arire.
Nygubhtu arire vf bsgra orggre guna *evtug* abj.
Vs gur vzcyrzragngvba vf uneq gb rkcynva, vg'f n onq vqrn.
Vs gur vzcyrzragngvba vf rnfl gb rkcynva, vg znl or n tbbq vqrn.
Anzrfcnprf ner bar ubaxvat terng vqrn -- yrg'f qb zber bs gubfr!"""

d = {}
for c in (65, 97):
    for i in range(26):
        d[chr(i+c)] = chr((i+13) % 26 + c)

print("".join([d.get(c, c) for c in s]))

What is this gibberish? This is certainly not the simple print statement I expected. A cursory glance and I can surmise that this is some sort of basic cipher. The adjoining letters in words means this is likely a simple ROT cipher, and if you look a bit further you will see a i+13. At this point we likely have enough information to say this is a ROT13 cipher, but the exact logic is still unclear. Let’s make a copy of the module that we can freely edit and nail down what is going on here!

The first line is clearly setting all the text using a docstring; s likely stands for ‘string’. And line 21 is clearly creating an empty dictionary; d likely stands for ‘dictionary’. But what’s going on after that???

for c in (65, 97):
    for i in range(26):
        d[chr(i+c)] = chr((i+13) % 26 + c)

print("".join([d.get(c, c) for c in s]))

Let’s comment out the print statement and add a few of our own in order to print until it makes sense:

for c in (65, 97):
    for i in range(26):
        print(d[chr(i+c)])
        print(chr((i+13) % 26 + c))
        break

#print("".join([d.get(c, c) for c in s]))

With this above modifications to our code, we should see the following:

A
N
a
n

So the inner for loop is acting on the range 0-25 (there are 26 letters in the alphabet) and updating the dictionary d with letters for the keys and values. The outer for loop ran twice, once for 65 and once for 97, which is why we have 4 outputs, but the relationship with these numbers is unclear. At this point it would behoove us to know more about the chr method:

help(chr)

chr(i, /)
    Return a Unicode string of one character with ordinal i; 0 <= i <= 0x10ffff.

The help method tells us that it returns a unicode string for one character. Let’s look at a unicode chart and see if 65 and 97 have any significance.

It looks like 65 is A (the start of the capital letters) and 97 is a (the start of the lowercase letter). So the line d[chr(i+c)] is adding numbers 0-25 to 65 and 97 for each iteration of the inner loop, giving us the dictionary keys; this is going to happen for each letter of the alphabet. Let’s analyze the next line to see where N/n is coming from.

print(chr((i+13) % 26 + c))

Each iteration of the inner loop is adding 13 to the range (N is 13 greater than A) and performing a modulo 26 + c. This took me a few minutes to work out, the modulo 26 is preventing us from getting an index out of range error. The easiest way to understand this logic is to grab a pen and paper and walk through the math!

Now that we have our dictionary built, let’s take a look at the final line:

print("".join([d.get(c, c) for c in s]))

This line introduces a new method, get, on the dictionary; after a Google search we learn that the first option is required, and it returns the value associated with the key. The second option is optional, and it returns the specified item if that key does not exist. So given a letter a-z or A-Z, d.get(c, c) will return the letter 13 places higher, however if we give it some other character it will simply return c. This must be how spaces and other symbols work! Let’s prove it by changing the default value:

print("".join([d.get(c, '^') for c in s]))

As expected, this prints out all special characters as ^ and is quite horrendous, but it does prove our theory. With that, let’s rewrite this final line in expanded form with longer variables to better understand how the list comprehension is actually working:

new_string = ""
for character in s:
    new_string += "".join([d.get(character, character)])
print(new_string)

So in the end it was indeed a simple, yet elegantly written, ROT13 cipher using unicode. But to me is represents much more. It represents the quirkiness and openness of Linux. We were never stopped from learning and only limited by our own interest or lack thereof.