“Distinct” Results from a Cloudant Search Index

Search indexes are a really nice feature of Cloudant.  I’d go so far as to say they’re essential, without them what many would consider routine database tasks would be much more complicated.  Let’s look at a simplified set of the data available at xbrl.mybluemix.net.  IBM reporting some financial information over time:

{ "date": "03/31/2010", "company": "IBM", 
"concept":"Assets", "value": 100.00}
{ "date": "06/30/2010", "company": "IBM", 
"concept":"Assets", "value": 110.00}

...repeats...

{ "date": "12/31/2014", "company": "IBM", 
"concept":"Assets", "value": 200.00}
{ "date": "03/31/2010", "company": "IBM", 
"concept":"Current Assets", "value": 50.00}

...continues

Nothing too complicated so far.  Let’s create a search index on the concept field,  it would look like this:

function(doc){
	index("default", 
		doc.company, 
		{"store": true});
	index("concept", 
		doc.concept, 
		{"store": true});
}

So far everything looks good.  In order to populate the second auto-complete combo-box on the main page of xbrl.mybluemix.net we need to search for all the concepts that a company filed that contain a certain string. Let’s suppose we’re looking for all concepts that IBM filed containing the text “Assets”, limited to 10 results. The query URI would look like something this:

https://myaccount.cloudant.com/facts/_design/factsSearches/_search/nameSearch?q="IBM"%20AND%20name:Assets&limit=10

The actual JSON for the results isn’t important yet, what’s important is the response from this URI would contain the following results:

{ "date": "03/31/2010", "company": "IBM", 
"concept":"Assets", "value": 100.00}
{ "date": "06/30/2010", "company": "IBM", 
"concept":"Assets", "value": 110.00}

... repeats ...

{ "date": "06/32/2012", "company": "IBM", 
"concept":"Assets", "value": 150.00}

“Current Assets” is not there! Nor should it be, the search index did exactly what it was asked to do. What we actually wanted was to find distinct concepts with “Assets” in them. In SQL terms we were looking for:

SELECT DISTINCT 
    concept 
FROM 
    facts 
WHERE 
    company='IBM' 
    AND
    concept LIKE '%Assets%'

A naive solution would be to increase the limit, say to 100, then simply process the list on the client or server, keeping only distinct results, before populating the combo-box.  This would work for our simple example, but suppose IBM reported its Assets 100 times, or 1000 times?  Cloudant imposes a maximum limit of 200 results.  If IBM had reported “Assets” 1000 times (note that this is a bit of a contrived example) we’d be out of luck.

Since we only need a list of distinct values of a single field (concept) we can use the counts facet. Simply rebuild the index like so:

function(doc){
	index("company", 
		doc.company, 
		{"store": true});
	index("concept", 
		doc.concept, 
		{"store": true, "facet": true});
}

Now the query URI looks like this:

https://myaccount.cloudant.com/facts/_design/factsSearches/_search/nameSearch?q="IBM"%20AND%20name:Assets&limit=0&ampcounts=%5B"name"%5D

We use limit=0 since we aren’t really concerned with the actual documents, just the list of distinct values, counts=[“name”] gives us this. The JSON resulting from this query would look something like:

{"total_rows":17,"bookmark":"g2o","rows":[],"counts":{"name":{"Assets":16,"CurrentAssets":1}))

That final associative array is what we’re after, a list of permutations of the name field containing “Assets” and how many times each permutation appears.

Caveat: There is no “limit=x” option when working with the counts facet. So if the query were to be something like all documents where the concept contains “e” the query could take a very long time and return an extremely long list of counts.

Advertisements
“Distinct” Results from a Cloudant Search Index

One thought on ““Distinct” Results from a Cloudant Search Index

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s