Substitution Ciphers – Not Just for the Ancient Roman Military

Recall from a previous post that we were trying to get multiple “distinct” fields from a Cloudant search index.  We did this by concatenating the two fields during index creation.  Recall also that there was serious drawback with this method, that we were indexing both fields, rather than indexing one and getting both back.

This isn’t a problem if the two fields are using strictly different characters spaces.  For example, one only alphabetical characters and one only numeric characters.  Even if this isn’t the case, we can force it to be using a substitution cipher.  JSON documents are not limited to ASCII characters, the UTF-8 character space is quite large, and we can simply shift one of the fields to a totally different character set when creating the index.

Here’s how the search index would look (credit to Tim Severien’s cipher example on GitHub for the rotate function):

function rotateText(text, rotation) {
    // Surrogate pair limit
    var bound = 0x10000;

    // Force the rotation an integer and within bounds, just to be safe
    rotation = parseInt(rotation) % bound;

    // Might as well return the text if there's no change
    if(rotation === 0) return text;

    // Create string from character codes
    return String.fromCharCode.apply(null,
        // Turn string to character codes
        text.split('').map(function(v) {
            // Return current character code + rotation
            return (v.charCodeAt() + rotation + bound) % bound;
        })
    );
}

function(doc){
	index("companyName", rotateText(doc.identifier, 500) + ' ' + doc.name, {"store": true, "facet":true});
}

We’re creating the index on the concatenated string consisting of the cipher-ed identifier and the original name. This means the pair [‘0000051143’, ‘INTERNATIONAL BUSINESS MACHINES CORP’] will become the string “ȤȤȤȤȤȩȥȥȨȧ INTERNATIONAL BUSINESS MACHINES CORP”.

From here we continue as before, using faceting to return a list of distinct results. It’s then simply a matter of reversing the cipher to get the identifier back (in this example rotate the text by -500). You can either make the very reasonable assumption that no one will enter characters from the cipher-ed character space into your search field, or not allow searches that contain them.

Advertisements
Substitution Ciphers – Not Just for the Ancient Roman Military

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s