In the previous post we were trying to use a search index to replicate the following SQL:
SELECT DISTINCT concept FROM facts WHERE company='IBM' AND concept LIKE '%Assets%'
We were able to do this thanks to faceting. Let’s now turn our attention to the other combo-box on xbrl.mybluemix.net. Here we need to search companies, not financial concepts. Why not just use faceting again? There’s a fly in the ointment, a company’s name may not be unique, what we’re interested in is a distinct list of id/name pairs. In SQL this would mean:
SELECT DISTINCT identifier, name FROM companies WHERE name LIKE '%International%'
Faceting won’t help here. It can give us a distinct list of identifiers and a distinct list of names, but not id/name pairs.
One solution is to simply concatenate the two fields when creating the index. Recall from this post that the fields being indexed need not actually exist, they can be created at run-time. Such a search index would look something like this:
function(doc){ index("identifierNameConcat", doc.identifier + ' :: ' + doc.name, {"store": true, "facet": true}); }
Whatever separator we decide to use it obviously must not appear in either the “id” or “name” field. Note the spaces in the separator, we don’t want “0000051143::International” to be a single token (we want “0000051143 :: International Business Machines”). We can now use a query URL like this:
http://account.cloudant.com/companies/_design/companiesSearchIndexes/_search/nameSearch?q=identifierNameConcat:International&counts=%5B"identifierNameConcat"%5D&limit=0
There is a serious drawback with this method. The two fields need to be very different, ideally something like strictly numeric and strictly alphabetic. Since we are technically indexing both id and name the results of our query will contain documents with the query string appearing in either. For example, searching for “000*” will return “0000051143 :: International Business Machines”. This is probably not what we intended. It goes the other way as well, if we were trying to find a company whose name started with “100” we’d get back every company that had an identifier beginning with “100”. In SQL terms what we actually ended up creating was something this:
SELECT DISTINCT identifier, name FROM companies WHERE identifier LIKE '%International%' OR name LIKE '%International%'
Oops. This is serious enough that this method is probably unusable in most cases. I present it here in case someone does find it both usable and useful.
[…] from a previous post that we were trying to get multiple “distinct” fields from a Cloudant search index. […]