Captcha
Charvises (the native species which lived on Charvis 8HD before the
first settlers arrived) were very good at math. In a surprising
symbiosis relationship between humans and Charvises, it was agreed that
the Charvises would be responsible for C8.
Can you pass their CAPTCHA (Completely Automated Public Turing Test to
tell Charvises and Humans Apart)?
Look here for a listing of related files.
Let me say, first, that this problem was fascinating to me. Once I started working on it, I couldn’t stop thinking about it until I solved it – and I fully acknowledge that I was a bit out of my depth with this one!
The problem contained a link to a page with a long arithmetic problem, and a text box. Refreshing the page a couple of times showed that the problem changed every couple of seconds or so.
My first approach was to try pasting the challenge text into my python interpreter, and was quickly foiled by the fact that the actual text of the problem was composed of seemingly random characters instead of the numbers and symbols they appeared to be!
My next approach was to try and solve the problem at human speed, just to see
what happened. My strategy was to get the challenge text into a string, then
build the translation character by character. The eventual result of this was
the translate.py
script, which prompts the user to input the challenge text,
and then prompts again for every unknown symbol – displaying all of the
translatable text up to that point for reference. Once it can translate the
entire string, it evaluates it to get the answer.
I could have sworn there was a “unique” built-in in python and, to my shame, I
ended up with a not very “pythonic” for loop. I always forget about set()
!
Once I solved the problem, the next page told me that I was correct, but (as I had feared) that I had been too slow!
The next step was to figure out how they were displaying one character, but encoding another. Looking at the source, I got the impression that it had something to do with the base64 encoded blob in the style element.
@font-face{
font-family: DigitalRightsDoneRight;
src:url('data:application/font-ttf;charset=utf-8;base64,AAEAAAALAIAAAwAwR1NVQiCLJXoAAAE4AAAAVE9TLzJWY1+ZAAABjAAAAFZjbWFwMwQwzAAAAiQAAAJ8Z2x5Zupt8M4AAATEAAAHKGhlYWQRhteAAAAA4AAAADZoaGVhBWsBsgAAALwAAAAkaG10eB9hAAAAAAHkAAAAQGxvY2EQQA5CAAAEoAAAACJtYXhwAR0ASQAAARgAAAAgbmFtZfmSK2MAAAvsAAACvnBvc3QIbAhIAAAOrAAAAFQAAQAAAyD/OAAAAmkAAAAAAkkAAQAAAAAAAAAAAAAAAAAAABAAAQAAAAEAAKIml/dfDzz1AAsD6AAAAADYD0oYAAAAANgPShgAAP+AAkkC6wAAAAgAAgAAAAAAAAABAAAAEAA9AAMAAAAAAAIAAAAKAAoAAAD/AAAAAAAAAAEAAAAKADAAPgACREZMVAAObGF0bgAaAAQAAAAAAAAAAQAAAAQAAAAAAAAAAQAAAAFsaWdhAAgAAAABAAAAAQAEAAQAAAABAAgAAQAGAAAAAQAAAAEB9gGQAAUAAAJ6ArwAAACMAnoCvAAAAeAAMQECAAACAAUDAAAAAAAAAAAAAAAAAAAAAAAAAAAAAFBmRWQAQABEAHgDIP84AFoDIADIAAAAAQAAAAAAAAAAAAACLQAAAiwAAAFUAAACXQAAAkcAAAJKAAACHAAAAVQAAAG/AAACXAAAAksAAAIcAAACaQAAAhwAAAJPAAAAAAAFAAAAAwAAACwAAAAEAAABuAABAAAAAACyAAMAAQAAACwAAwAKAAABuAAEAIYAAAAWABAAAwAGAEQASABPAFcAYQBrAG0AcAB2AHj//wAAAEQASABOAFMAYQBrAG0AcAB2AHj//wAAAAAAAAAAAAAAAAAAAAAAAAAAAAEAFgAWABYAGAAgACAAIAAgACAAIAAAAA8ACQAOAAMABwAEAAUAAQAIAAoADQACAAsADAAGAAABBgAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAPAAAACQAAAAAADgMAAAAHBAUBCAAAAAAAAAAAAAoAAAAAAAAAAAANAAIAAAsAAAAAAAwABgAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAMAAAAAADEAAAAAAAAAA8AAABEAAAARAAAAA8AAABIAAAASAAAAAkAAABOAAAATgAAAA4AAABPAAAATwAAAAMAAABTAAAAUwAAAAcAAABUAAAAVAAAAAQAAABVAAAAVQAAAAUAAABWAAAAVgAAAAEAAABXAAAAVwAAAAgAAABhAAAAYQAAAAoAAABrAAAAawAAAA0AAABtAAAAbQAAAAIAAABwAAAAcAAAAAsAAAB2AAAAdgAAAAwAAAB4AAAAeAAAAAYAAAAAAEYAjgC+AQwBYAGsAe4CHAJIAo4C3AMGAzYDTAOUAAAAAQAAAAAB8QLAAC8AACUWFAYjIicmIyIHIiY3PgE3NjU0JiMiDgEVDgEiJjU0PgEyHgEVFAcOAQc2MzIXMgHnChYQDCxjH21MDxICDHJoe04+JzwfARInETZifF80nUJlEidYSHIRPgkdEgIDBBANSYVJV2Y8TCc/JA8OFBMyWTU1XDiFbSxjLgIDAAABAAAAAAH+ArgAMgAAARQGIyIHBgcOASMiJjU0NzY3BiMiNTQ2MzI3NjU0JiMiBgcGIyImNTQ3PgEzMhUUBzcyAf4RETIaLzoEFwwRFgMtNBgxJBQQQR4jNUgvWTIDBw4QHR9qO8sfMycBUw8SAYyQCgsSDAcGZaABIA8SAW07MC4PDAERDhYLChOYPXABAAABAAD/gAEiAusAHQAAFyYnLgE1ND4BNzYyFhUUBg8BDgEVFB4BFxYVFAYi7wUGVV0rVDoKGBULCA9GRC49KRMUFX8BBkjggFmtiiQHEQ4LDQYMN8d0Yo5UKhUPDBIAAAADAAAAAAI5ArEAGwAkADMAAAEWFRQOASIuATU0PgE3LgE1ND4BMzIWFRQGBxYnBhQWMjY0JiITPgE1NC4BIg4BFRQWMzICDC1EeJl9RzNTMCEsL00sR1slH1DwHDhQMTBSkCsqMVZuWzRpXj0BXDVMQ2I0NmZEL1E5DA87KC9FJFFDKDsSGMoZVDI2Ui/95BVGKC5KKitGJklbAAABAAAAAAITArIAPAAANyY1NDYzMhUeATMyNjU0JiMiBgcGIyImPwE2NxYzMjc2MzIVFAYjIg8BIicmIwcUMzc+ATMyHgEVFAYjIm05FBIoAUJPVF1VUh00NQYKERcBAgkBWHAmOC4THhIQJxZeNjQHDQkJCiMyG0ZnNYl0cTw8aBITIk1TT1BUXwoNAhMPGqZUBAICHxASAQECAacKAgkKP29Ha3YAAAABAAAAAAIkAqwANwAAARYVFA4BIyA1NDIVFDMyPgE1NCYjIjU0NjMyNTQuASIOARUUFhUUBiMiJyY1ND4BMzIeARQGBxYCCBw0Z0j+50zFM0klTFQfDRGQHzlLQCYGExIbCQk5Yjo2XzkyKzIBJS47LFY4+CojvyI2H0VGHhEScx02IiA5JAYcBwwPFBocM1QvL1JmTRIUAAEAAAAAAbwB4QArAAAlFAYjIiYvAQYHBiMiJjU0NzY3Jy4BNTQ2MzIWFxYXNz4BMzIWFRQGDwEXFgG8Ew0KDQptNTYODwwWBzw5dAgEFA8KDQkcSVwIDAkRFAMFZ30IrAsSCQltOjcOEgkMBz8/cQgJBgsTCQkfSWYJCBENBwkGcHsIAAEAAP+AASIC6wAcAAAXJjU0Nj8BPgE1NC4BJyY1NDYyFhceARUUBgcGIj0LDAkNRkQuPSkTFBUJBlVeY1YKGHcIDgsPBgo4x3NijlUpExEMEgIGSt9/h/Q5BwAAAQAAAAABNwKhABoAACUUBiMiNREGBw4BIyImNTQ2NzY/AT4BMzIWFQE3FBMjNikDDQYQEwgGJBs4BxgMFRwjDxIhAhRGLwMEEA4GEAcqI0kLChcUAAAAAAIAAAAAAiACpwAiAC8AAAEeARUUBiMiJjU0PgEzMhYXFhUUBiImJy4CIyIGBz4BMzITNjU0LgEjIgYVFBYyAbY1NYJtgoVEgFdKYiMHGBgNCxQhOChfbAMbbTxMLCchRjVFZVaiAXsfYjlYZ5mXb6ldSkEOCA4TCw8iKR6gkCw1/s0lNyVGLURGQU4AAQAA//0CQAKyADMAAAEWFRQGBwYPARQGIiY/AQYjIicmNTQ3PgE3NjIWFAcOAQcGFRQXFjMyPwE0MzIWDwE2NzYCNwkNETU1BxQlEgEHLGJtRBwHQ31RCR4UE014IAINMVsmRgkjFBQBCUIiDgEQCREOEQIGA60PExQRpwERBxkNDYPCUQkTHBRQsk8GAgUDBwLiIRUS2AYEAgAAAQAAAAAB7wIHABsAAAEWFRQjJxUUBiImPQEmIyI0Mxc1NDMyFh0BFzIB6Ackng0jDDRwHyCiHxENnxUBTwgQHQKZEBIRE5cBPAGRHg8PkgEAAAAAAgAAAAACSQKUAA4AHgAANy4BNTQ+ATIeARUUDgEiNz4BNTQuASIOARUUHgEzMpw+Pj18tn09Qn+uxC4wMFp7Wy8uWDxAOCqLT1SeZmOeV1GLU2AfbUNKgE5PgUhCbUAAAQAAAAAB0QFbAAoAABM0NjMlMhUUIwUiTBEWATUpK/7JIwE4EBADIR0DAAACAAAAAAIxArEAIQAwAAABHgEVFA4BIyIuATU0NjIWFRQWMzI2Nw4BIyImNTQ+ATMyEz4BNTQuASMiDgEVFDMyAbY8Pz15Vk5zPRQgFV9TYGgFGG9AcH48a0RUAScvLU4wM0kkoysCiyaJXHGtYDRbORAVExE8TaOQODxxbDthN/6fEkYvLEcoK0UloAAAAAAAABAAxgABAAAAAAABABYAAAABAAAAAAACAAcAFgABAAAAAAADABYAHQABAAAAAAAEABYAMwABAAAAAAAFAAsASQABAAAAAAAGABYAVAABAAAAAAAKACsAagABAAAAAAALABMAlQADAAEECQABACwAqAADAAEECQACAA4A1AADAAEECQADACwA4gADAAEECQAEACwBDgADAAEECQAFABYBOgADAAEECQAGACwBUAADAAEECQAKAFYBfAADAAEECQALACYB0kRpZ2l0YWxSaWdodHNEb25lUmlnaHRSZWd1bGFyRGlnaXRhbFJpZ2h0c0RvbmVSaWdodERpZ2l0YWxSaWdodHNEb25lUmlnaHRWZXJzaW9uIDEuMERpZ2l0YWxSaWdodHNEb25lUmlnaHRHZW5lcmF0ZWQgYnkgc3ZnMnR0ZiBmcm9tIEZvbnRlbGxvIHByb2plY3QuaHR0cDovL2ZvbnRlbGxvLmNvbQBEAGkAZwBpAHQAYQBsAFIAaQBnAGgAdABzAEQAbwBuAGUAUgBpAGcAaAB0AFIAZQBnAHUAbABhAHIARABpAGcAaQB0AGEAbABSAGkAZwBoAHQAcwBEAG8AbgBlAFIAaQBnAGgAdABEAGkAZwBpAHQAYQBsAFIAaQBnAGgAdABzAEQAbwBuAGUAUgBpAGcAaAB0AFYAZQByAHMAaQBvAG4AIAAxAC4AMABEAGkAZwBpAHQAYQBsAFIAaQBnAGgAdABzAEQAbwBuAGUAUgBpAGcAaAB0AEcAZQBuAGUAcgBhAHQAZQBkACAAYgB5ACAAcwB2AGcAMgB0AHQAZgAgAGYAcgBvAG0AIABGAG8AbgB0AGUAbABsAG8AIABwAHIAbwBqAGUAYwB0AC4AaAB0AHQAcAA6AC8ALwBmAG8AbgB0AGUAbABsAG8ALgBjAG8AbQAAAAIAAAAAAAAACgAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAEAECAQMBBAEFAQYBBwEIAQkBCgELAQwBDQEOAQ8BEAERAAAAAAAAAAAAAAAAAAAAAAAA');
}
Refreshing the page a couple of times and comparing the sources showed that blob changed every time the challenge text changed, so there would probably be no way to create a map based on previously received information.
Continuing the recon, I pasted the whole base64 blob into a text file, then attempted to decode and analyze it:
$ base64 -d font.txt | file -
# /dev/stdin: TrueType Font data, 11 tables, 1st "GSUB", 16 names, Macintosh, type 1 string, DigitalRightsDoneRightRegularDigitalRightsDoneRightDigitalRightsDoneRightVersion 1.0DigitalRigh
Which was my cue to start reading about TrueType fonts. I ended up getting most of my information from the Microsoft OpenType specification, but didn’t come across that site until I had wasted a sufficient amount of time on the Apple TrueType documentation, and various blogs that promised to “take an in-depth look at the TrueType font file format”.
Really, the Microsoft documentation wasn’t that good either, but that might just be due to the fact that the file format itself is so wonky. In summary, TrueType Font files are composed of many nested and cross-referenced tables. The ones we care about for this problem are the ones responsible for storing the actual “contour” information for each glyph, and the ones responsible for mapping those contours to the actual encoded characters we want to display (or, in our case, decode).
Before we can parse all those tables, we have to get the data. I did this using the inimitable python requests module. Here’s the code I used to request the challenge, and submit some bogus response (the site isn’t running anymore, as the contest has ended. However, Square publishes a Docker container that will run this challenge for you locally):
import re
import requests
def get_answer(data):
token = re.search(r'"token" value="(\d+)"', data).group(1)
return {'token': token, 'answer': 1234}
URL = 'https://hidden-island-93990.squarectf.com/ea6c95c6d0ff24545cad'
r1 = requests.get(URL)
r2 = requests.post(URL, data=get_answer(r1.text))
print(r2.text)
Currently, all the get_answer
function does is parse out the token for the
problem, which is a hidden field in the form submission. I’m doing all the HTML
parsing with quick-n-dirty regular expressions, which
isn’t generally a good idea, but is sufficient for our purposes in this
problem.
If we point this code at a server running this challenge, we will almost always (unless we’re incredibly lucky and the correct answer is 1234) get a rejection message. But we’re far from done. Let’s sketch out everything we’ll need for parsing out the correct answer:
def get_mapping(font):
# TODO
pass
def eval_answer(problem, mapping):
# TODO
pass
def get_answer(data):
font = re.search(r'base64,(.+)\'', data).group(1)
token = re.search(r'"token" value="(\d+)"', data).group(1)
problem = re.search(r'<p>(.+)</p>', data).group(1)
answer = eval_answer(problem, get_mapping(font))
return {'token': token, 'answer': answer}
The regular expressions absolutely will not work in general cases. Especially
the one that parses problem
. But the document we’re parsing for this problem
is simple enough that we shouldn’t run into any problems. eval_answer
is easy:
we already wrote that in the previous attempt.
def eval_answer(problem, mapping):
translation = ''.join([mapping[x] for x in problem])
print(translation)
return eval(translation)
Now we’ve reduced the problem to “given a TrueType font, map glyphs to the characters they represent”. Easy, right‽
First thing’s first: convert base64 data to bytes. Python modules to the rescue!
import base64
def get_mapping(font):
raw = base64.b64decode(font)
# TODO
Now that we have the raw data to parse, we can apply what we learned reading the
OpenType documents. The first 12 bytes of the file is a special offset
table
that indicates how many tables there are. This is followed immediately by a
table of fixed-width records for each table in the rest of the document. Each of
these records contain information about the name, offset, and size of the table
it represents. Do you think maybe there should be more tables?
def parse_ttf_tables(data):
tables = {
'offset': {
'type': int.from_bytes(data[:4], 'big'),
'numTables': int.from_bytes(data[4:6], 'big'),
'searchRange': int.from_bytes(data[6:8], 'big'),
'entrySelector': int.from_bytes(data[8:10], 'big'),
'rangeShift': int.from_bytes(data[10:12], 'big')
}
}
for i in range(tables['offset']['numTables']):
j = 12 + (i * 16)
cs = int.from_bytes(data[j + 4:j + 8], 'big')
ofs = int.from_bytes(data[j + 8:j + 12], 'big')
length = int.from_bytes(data[j + 12:j + 16], 'big')
tables[data[j:j + 4].decode('ascii')] = {
'checkSum': cs,
'offset': ofs,
'length': length,
'data': data[ofs:ofs + length]
}
return tables
I’ve tried to use as much of the official terminology as possible in the dict keys and variable names, so that it’s obvious what’s being parsed from where. This function’s primary purpose is to map the raw data for each table in the file by name.
Now the tables we really care about are called glyf
and cmap
. Unfortunately,
glyf
depends on a table called loca
, and loca
depends on a couple of
tables called head
and maxp
.
head
is basically just a table of named integers, and is easy enough to parse.
def parse_head(data):
return {
'majorVersion': int.from_bytes(data[0:2], 'big'),
'minorVersion': int.from_bytes(data[2:4], 'big'),
'fontRevision': data[4:8],
'checkSumAdjustment': int.from_bytes(data[8:12], 'big'),
'magicNumber': int.from_bytes(data[12:16], 'big'),
'flags': int.from_bytes(data[16:18], 'big'),
'unitsPerEm': int.from_bytes(data[18:20], 'big'),
'created': int.from_bytes(data[20:28], 'big', signed=True),
'modified': int.from_bytes(data[28:36], 'big', signed=True),
'xMin': int.from_bytes(data[36:38], 'big', signed=True),
'yMin': int.from_bytes(data[38:40], 'big', signed=True),
'xMax': int.from_bytes(data[40:42], 'big', signed=True),
'yMax': int.from_bytes(data[42:44], 'big', signed=True),
'macStyle': int.from_bytes(data[44:46], 'big'),
'lowestRecPPEM': int.from_bytes(data[46:48], 'big'),
'fontDirectionHint': int.from_bytes(data[48:50], 'big', signed=True),
'indexToLocFormat': int.from_bytes(data[50:52], 'big', signed=True),
'glyphDataFormat': int.from_bytes(data[52:54], 'big', signed=True),
}
maxp
is essentially the same, except that there are a couple of different
versions. As Dave Jones would say, I may have gilded the lily with some of
this, but at the time I really didn’t know what would or wouldn’t be important.
def parse_maxp(data):
version = data[0:4]
if version == b'\x00\x00\x50\x00':
# Version 0.5
return {
'version': version,
'numGlyphs': int.from_bytes(data[4:6], 'big'),
}
else:
# Version 1.0
return {
'version': version,
'numGlyphs': int.from_bytes(data[4:6], 'big'),
'maxPoints': int.from_bytes(data[6:8], 'big'),
'maxContours': int.from_bytes(data[8:10], 'big'),
'maxCompositePoints': int.from_bytes(data[10:12], 'big'),
'maxCompositeContours': int.from_bytes(data[12:14], 'big'),
'maxZones': int.from_bytes(data[14:16], 'big'),
'maxTwilightPoints': int.from_bytes(data[16:18], 'big'),
'maxStorage': int.from_bytes(data[18:20], 'big'),
'maxFunctionDefs': int.from_bytes(data[20:22], 'big'),
'maxInstructionDefs': int.from_bytes(data[22:24], 'big'),
'maxStackElements': int.from_bytes(data[24:26], 'big'),
'maxSizeOfInstructions': int.from_bytes(data[26:28], 'big'),
'maxComponentElements': int.from_bytes(data[28:30], 'big'),
'maxComponentDepth': int.from_bytes(data[30:32], 'big'),
}
As it turned out, the maximum number of “twilight points” didn’t factor in to
the final solution. Live and learn! Now that we’ve parsed the dependencies for
loca
, we can parse loca
itself. We really only needed the number of glyphs
from maxp
, and the size of the records in loca
from head
.
def parse_loca(data, head, maxp):
size = 2 if head['indexToLocFormat'] == 0 else 4
n = maxp['numGlyphs'] + 1
return [
int.from_bytes(data[i * size:(i + 1) * size], 'big') * 2
for i in range(n)
]
This function gave me quite a bit of trouble, because for some reason the
designers of this format decided to require a bit of a fudge factor for a couple
of these values! Notice the + 1
at the end of the assignment to n
, and the
* 2
at the end of the expression in the list comprehension.
Now that we’ve parsed loca
, we should be able to parse glyf
. Well, this is
sort of where I gave up. As it turns out, there’s a whole glyph contour
description language, and I was running out of time to finish this problem. So,
I decided to be lazy. Once I was able to get the chunk of data corresponding to
each glyph description, I simply stuck those into the keys of a dict, and
guessed at the correct translation.
knownglyphs = {
b'': 0,
b'\x00\x01\x00\x00\x00\x00\x02$\x02\xac\x007\x00\x00\x01\x16\x15\x14\x0e\x01# 542\x15\x1432>\x0154&#"5463254.\x01"\x0e\x01\x15\x14\x16\x15\x14\x06#"\'&54>\x0132\x1e\x01\x14\x06\x07\x16\x02\x08\x1c4gH\xfe\xe7L\xc53I%LT\x1f\r\x11\x90\x1f9K@&\x06\x13\x12\x1b\t\t9b:6_92+2\x01%.;,V8\xf8*#\xbf"6\x1fEF\x1e\x11\x12s\x1d6" 9$\x06\x1c\x07\x0c\x0f\x14\x1a\x1c3T//RfM\x12\x14': '3',
b'\x00\x01\x00\x00\xff\x80\x01"\x02\xeb\x00\x1c\x00\x00\x17&546?\x01>\x0154.\x01\'&5462\x16\x17\x1e\x01\x15\x14\x06\x07\x06"=\x0b\x0c\t\rFD.=)\x13\x14\x15\t\x06U^cV\n\x18w\x08\x0e\x0b\x0f\x06\n8\xc7sb\x8eU)\x13\x11\x0c\x12\x02\x06J\xdf\x7f\x87\xf49\x07\x00': ')',
b'\x00\x01\x00\x00\x00\x00\x017\x02\xa1\x00\x1a\x00\x00%\x14\x06#"5\x11\x06\x07\x0e\x01#"&54676?\x01>\x0132\x16\x15\x017\x14\x13#6)\x03\r\x06\x10\x13\x08\x06$\x1b8\x07\x18\x0c\x15\x1c#\x0f\x12!\x02\x14F/\x03\x04\x10\x0e\x06\x10\x07*#I\x0b\n\x17\x14\x00\x00\x00': '1',
b'\x00\x02\x00\x00\x00\x00\x021\x02\xb1\x00!\x000\x00\x00\x01\x1e\x01\x15\x14\x0e\x01#".\x015462\x16\x15\x14\x163267\x0e\x01#"&54>\x0132\x13>\x0154.\x01#"\x0e\x01\x15\x1432\x01\xb6<?=yVNs=\x14 \x15_S`h\x05\x18o@p~<kDT\x01\'/-N03I$\xa3+\x02\x8b&\x89\\q\xad`4[9\x10\x15\x13\x11<M\xa3\x908<ql;a7\xfe\x9f\x12F/,G(+E%\xa0\x00\x00\x00': '9',
b'\x00\x01\x00\x00\x00\x00\x01\xf1\x02\xc0\x00/\x00\x00%\x16\x14\x06#"\'&#"\x07"&7>\x017654&#"\x0e\x01\x15\x0e\x01"&54>\x012\x1e\x01\x15\x14\x07\x0e\x01\x07632\x172\x01\xe7\n\x16\x10\x0c,c\x1fmL\x0f\x12\x02\x0crh{N>\'<\x1f\x01\x12\'\x116b|_4\x9dBe\x12\'XHr\x11>\t\x1d\x12\x02\x03\x04\x10\rI\x85IWf<L\'?$\x0f\x0e\x14\x132Y55\\8\x85m,c.\x02\x03\x00': '2',
b'\x00\x02\x00\x00\x00\x00\x02 \x02\xa7\x00"\x00/\x00\x00\x01\x1e\x01\x15\x14\x06#"&54>\x0132\x16\x17\x16\x15\x14\x06"&\'.\x02#"\x06\x07>\x0132\x13654.\x01#"\x06\x15\x14\x162\x01\xb655\x82m\x82\x85D\x80WJb#\x07\x18\x18\r\x0b\x14!8(_l\x03\x1bm<L,\'!F5EeV\xa2\x01{\x1fb9Xg\x99\x97o\xa9]JA\x0e\x08\x0e\x13\x0b\x0f")\x1e\xa0\x90,5\xfe\xcd%7%F-DFAN': '6',
b'\x00\x01\x00\x00\x00\x00\x01\xfe\x02\xb8\x002\x00\x00\x01\x14\x06#"\x07\x06\x07\x0e\x01#"&54767\x06#"546327654&#"\x06\x07\x06#"&547>\x0132\x15\x14\x0772\x01\xfe\x11\x112\x1a/:\x04\x17\x0c\x11\x16\x03-4\x181$\x14\x10A\x1e#5H/Y2\x03\x07\x0e\x10\x1d\x1fj;\xcb\x1f3\'\x01S\x0f\x12\x01\x8c\x90\n\x0b\x12\x0c\x07\x06e\xa0\x01 \x0f\x12\x01m;0.\x0f\x0c\x01\x11\x0e\x16\x0b\n\x13\x98=p\x01\x00': '7',
b'\x00\x01\x00\x00\xff\xfd\x02@\x02\xb2\x003\x00\x00\x01\x16\x15\x14\x06\x07\x06\x0f\x01\x14\x06"&?\x01\x06#"\'&547>\x01762\x16\x14\x07\x0e\x01\x07\x06\x15\x14\x17\x1632?\x01432\x16\x0f\x01676\x027\t\r\x1155\x07\x14%\x12\x01\x07,bmD\x1c\x07C}Q\t\x1e\x14\x13Mx \x02\r1[&F\t#\x14\x14\x01\tB"\x0e\x01\x10\t\x11\x0e\x11\x02\x06\x03\xad\x0f\x13\x14\x11\xa7\x01\x11\x07\x19\r\r\x83\xc2Q\t\x13\x1c\x14P\xb2O\x06\x02\x05\x03\x07\x02\xe2!\x15\x12\xd8\x06\x04\x02\x00': '4',
b'\x00\x01\x00\x00\x00\x00\x01\xef\x02\x07\x00\x1b\x00\x00\x01\x16\x15\x14#\'\x15\x14\x06"&=\x01&#"43\x175432\x16\x1d\x01\x172\x01\xe8\x07$\x9e\r#\x0c4p\x1f \xa2\x1f\x11\r\x9f\x15\x01O\x08\x10\x1d\x02\x99\x10\x12\x11\x13\x97\x01<\x01\x91\x1e\x0f\x0f\x92\x01\x00\x00\x00': '+',
b'\x00\x02\x00\x00\x00\x00\x02I\x02\x94\x00\x0e\x00\x1e\x00\x007.\x0154>\x012\x1e\x01\x15\x14\x0e\x01"7>\x0154.\x01"\x0e\x01\x15\x14\x1e\x0132\x9c>>=|\xb6}=B\x7f\xae\xc4.00Z{[/.X<@8*\x8bOT\x9efc\x9eWQ\x8bS`\x1fmCJ\x80NO\x81HBm@': '0',
b'\x00\x01\x00\x00\xff\x80\x01"\x02\xeb\x00\x1d\x00\x00\x17&\'.\x0154>\x01762\x16\x15\x14\x06\x0f\x01\x0e\x01\x15\x14\x1e\x01\x17\x16\x15\x14\x06"\xef\x05\x06U]+T:\n\x18\x15\x0b\x08\x0fFD.=)\x13\x14\x15\x7f\x01\x06H\xe0\x80Y\xad\x8a$\x07\x11\x0e\x0b\r\x06\x0c7\xc7tb\x8eT*\x15\x0f\x0c\x12\x00\x00': '(',
b'\x00\x03\x00\x00\x00\x00\x029\x02\xb1\x00\x1b\x00$\x003\x00\x00\x01\x16\x15\x14\x0e\x01".\x0154>\x017.\x0154>\x0132\x16\x15\x14\x06\x07\x16\'\x06\x14\x16264&"\x13>\x0154.\x01"\x0e\x01\x15\x14\x1632\x02\x0c-Dx\x99}G3S0!,/M,G[%\x1fP\xf0\x1c8P10R\x90+*1Vn[4i^=\x01\\5LCb46fD/Q9\x0c\x0f;(/E$QC(;\x12\x18\xca\x19T26R/\xfd\xe4\x15F(.J*+F&I[\x00': '8',
b'\x00\x01\x00\x00\x00\x00\x01\xbc\x01\xe1\x00+\x00\x00%\x14\x06#"&/\x01\x06\x07\x06#"&54767\'.\x0154632\x16\x17\x16\x177>\x0132\x16\x15\x14\x06\x0f\x01\x17\x16\x01\xbc\x13\r\n\r\nm56\x0e\x0f\x0c\x16\x07<9t\x08\x04\x14\x0f\n\r\t\x1cI\\\x08\x0c\t\x11\x14\x03\x05g}\x08\xac\x0b\x12\t\tm:7\x0e\x12\t\x0c\x07??q\x08\t\x06\x0b\x13\t\t\x1fIf\t\x08\x11\r\x07\t\x06p{\x08': '*',
b'\x00\x01\x00\x00\x00\x00\x01\xd1\x01[\x00\n\x00\x00\x13463%2\x15\x14#\x05"L\x11\x16\x015)+\xfe\xc9#\x018\x10\x10\x03!\x1d\x03\x00': '-',
b'\x00\x01\x00\x00\x00\x00\x02\x13\x02\xb2\x00<\x00\x007&54632\x15\x1e\x0132654&#"\x06\x07\x06#"&?\x0167\x16327632\x15\x14\x06#"\x0f\x01"\'&#\x07\x1437>\x0132\x1e\x01\x15\x14\x06#"m9\x14\x12(\x01BOT]UR\x1d45\x06\n\x11\x17\x01\x02\t\x01Xp&8.\x13\x1e\x12\x10\'\x16^64\x07\r\t\t\n#2\x1bFg5\x89tq<<h\x12\x13"MSOPT_\n\r\x02\x13\x0f\x1a\xa6T\x04\x02\x02\x1f\x10\x12\x01\x01\x02\x01\xa7\n\x02\t\n?oGkv\x00\x00': '5',
}
def parse_glyph(data, loca):
print(len(data))
out = []
for i in range(len(loca) - 1):
block = data[loca[i]:loca[i + 1]]
out.append(knownglyphs[block])
return out
This was the final, correct translation. The parse_glyph
function returns a
list of translated glyphs (based on the knownglyphs
dict) in order of their
“glyph index”. I’ll talk about how I determined the correct glyph-to-character
translation after we parse cmap
.
cmap
can provide different glyph to character mappings for different character
encoding schemes. The one I ended up using is called the Unicode platform
.
def parse_cmap(data):
# version = int.from_bytes(data[0:2], 'big')
numTables = int.from_bytes(data[2:4], 'big')
records = [
{
'platformID': int.from_bytes(data[i:i + 2], 'big'),
'encodingID': int.from_bytes(data[i + 2:i + 4], 'big'),
'offset': int.from_bytes(data[i + 4:i + 8], 'big'),
}
for i in range(4, (numTables * 8) + 4, 8)
]
for r in records:
ofs = r['offset']
r['format'] = int.from_bytes(data[ofs:ofs + 2], 'big')
if r['format'] == 0:
r['length'] = int.from_bytes(data[ofs + 2:ofs + 4], 'big')
r['language'] = int.from_bytes(data[ofs + 4:ofs + 6], 'big')
r['glyphIDArray'] = [
b for b in data[ofs + 6:ofs + r['length']]
]
elif r['format'] == 4:
r['length'] = int.from_bytes(data[ofs + 2:ofs + 4], 'big')
r['data'] = data[ofs:ofs + r['length']]
elif r['format'] == 12:
r['length'] = int.from_bytes(data[ofs + 4:ofs + 8], 'big')
r['data'] = data[ofs:ofs + r['length']]
else:
raise ValueError(f'unsupported fmt = {r["format"]}')
return records
I was lazy and didn’t really parse the other formats, and there are other
possible formats that didn’t appear in any of these files that I didn’t even
handle. The meat of this table is in the glyphIDArray
, for which the offset
of each element represents a unicode codepoint, and the value of the element is
the index of the glyph to render for the character at that codepoint.
Finally, we can combine the mapping from glyph index to translated character with the mapping from character to glyph index to obtain the final translation dict. Whew!
def get_mapping(font):
raw = base64.b64decode(font)
tables = parse_ttf_tables(raw)
head = parse_head(tables['head']['data'])
maxp = parse_maxp(tables['maxp']['data'])
loca = parse_loca(tables['loca']['data'], head, maxp)
glyph = parse_glyph(tables['glyf']['data'], loca)
print(glyph)
cmap = parse_cmap(tables['cmap']['data'])
map = [x['glyphIDArray'] for x in cmap if x['format'] == 0][0]
charMap = {chr(i): map[i] for i in range(len(map)) if map[i] != 0}
print(charMap)
glyphMap = {k: glyph[v] for k, v in charMap.items()}
glyphMap[' '] = ' '
return glyphMap
Unfortunately, we’re not quite done. We still need to determine the correct translation from glyph to character. I captured the file returned by a request, then displayed it with my browser.
r1 = requests.get(URL)
with open('temp.html', 'w') as f:
f.write(r1.text)
r2 = requests.post(URL, data=get_answer(r1.text))
print(r2.text)
Most of the time, this would fail at the eval step because the parenthesis
didn’t match. I compared the rendered characters to the translated characters
and swapped the knownglyphs
dict items around until the translation came out
right.
Unfortunately, I lost the exact output from the server for the correct answer. You can see for yourself by downloading the docker container from the challenge page.
Thanks for reading!