I’ve been playing with the IBM RAVE engine a bit today. I tried applying it to the data at xbrl.mybluemix.net. Here’s a text cloud showing the 100 most reported financial concepts. What does this mean? Nearly all filers are likely to report certain concepts like Assets. Furthermore, a concept that is more highly dimensionalized has more data points. For example, here’s General Motor’s breakout of StockholdersEquityIncludingPortionAttributableToNoncontrollingInterest. Very highly dimensionalized.
Unfortunately the query to generate this text cloud takes about 40 seconds to run, so it’s not feasible to have it live.