XBRL on IBM Bluemix

Note:  This post makes reference to my employment activities at IBM, please see the Disclaimer.

Something I’ve had going in the background for a little while is using Node.js to stream all the XBRL filing data off the SEC website, store it in a MongoDB instance, and create a web UI to display it. I’ve done every filing ever locally, but hosting the data has proven a little more restrictive, I only have space for the DOW 30. I’m using IBM Bluemix to host the Node.js “UI” portion, and the database building portion is running on a virtual machine that I use inside IBM.

I’ll post more about my experiences with Node.js, XBRL and Mongo a little later, but for now you can play with the app here:

http://xbrl.mybluemix.net

XBRL on IBM Bluemix

CDM Web

Note:  This post makes reference to my employment activities at IBM, please see the Disclaimer.

In my professional capacity I spend most of my time working in product management at IBM for a product called IBM Cognos Disclosure Managment.  Long story short, it’s a windows application that embeds Microsoft Office and allows users to collaborate on creating large documents by breaking the document down into “objects” with security and workflow.  It also connects to various data sources and can embed/refresh the data in said objects.  Since it integrates with desktop Office, there’s always been talk of somehow creating a browser version.  When Microsoft released OWA (Office Web Apps) Server, I thought I’d play around with it.

Note that OWA server is a totally stand alone re-distributable instance of the exact same technology that runs Office 365 (or whatever Microsoft is calling it these days).  Everything is being hosted locally, no content is flowing through Microsoft servers at any point.

Step One: Installing OWA Server

This took a fair bit of time just to provision the environment.

  • Must be installed on Windows Server (I used 2012)
  • Server must be joined to a domain (no idea why)
  • Nothing else “server-y” may be installed like SQL Server.

Anyway, I used these two documents to get me there:

Plan Office Web Apps Server

Deploy Office Web Apps Server

Step Two: Open a Document

Not as easy as it sounds.  Actually installing the server doesn’t get you anything.  There’s no URL to navigate to where you can just open a file, you have to implement a WOPI (Web Application Open Platform Interface) web service to provide information about documents that OWA can actually open.  Fine, I used these two documents to get me most of the way there:

Introducing WOPI

Building an Office Web Apps (OWA) WOPI Host

If I remember rightly this didn’t actually get me all the way there.  I had to do a fair bit of debugging on the OWA server, this article describes where the logs are.  They are very verbose, so I just searched for “error” and “exception”. Also you’ll need to understand what ASP.NET Web API is.

Step Three: Integrate with CDM

I decided to write an MVC 4 web application that would simply talk to the CDM application server on it’s existing web service endpoints.  Since CDM already has a desktop client, I thought this should look like a mobile app, luckily MVC4 has some really nice and easy routes to building a mobile app.

The toughest part for me was actually coming to grips with MVC’s “fuzzy” routing.  I found this routing debugger to be insanely helpful, remember, the order that you set up your routes in matters!  That one caused me to tear my hair out for a few hours.

Step Four: Profit?

Here’s a quick video of the app in action, running inside Chrome emulating a Nexus 5.

CDM Web

Dynamic Typed Languages for Complex Object Models

A depressing amount of my professional life has revolved around XBRL.  For those too lazy to follow the link I’ll summarize by saying it’s an XML based way of describing and providing financial information.  Something that’s been commented on by many but actioned by few is that XBRL can represent multi-dimensional information, so OLAP is a natural storage mechanism.  For those who aren’t aware I currently work at IBM, and since IBM owns an OLAP engine called TM1, I thought I’d play around with whether or not storage of a complete XBRL instance and taxonomy in TM1 is possible.

In order to get started I needed an XBRL processor to load sample documents and attempt to create and populate corresponding concpets in TM1.  This is as far as I’ve gotten.  The most popular XBRL processor seems to be Arelle which is an open source implementation written in Python.  This was my first exposure to Python, and though I’m sure this has been said before, my initial impression is:

  • It would be really easy to bang out a small project in Python.
  • Attempting to work with someone else’s project, particularly in the context of a complex object model seems much more difficult.

The reason is that Python uses dynamic types meaning that auto-complete within an IDE is not available for anything other than static members.  For serious programming neophytes the distinction looks like this, a Python IDE will auto-complete a statement like this (note this is not actual Python, just an example):

File.ReadContents("test.txt")

since ReadContents is a static method of the File class.  However, the second line of this code will be a problem:


myFile = File.Open("test.txt")

myFile.Append("blah")

Since myFile is a dynamic type, it doesn’t have any “methods” until run-time.

This thread on stackoverflow is rife with people claiming support for auto-complete, but as far as I can tell they are referring to base or imported modules, which are well known so an IDE can come pre-loaded with some kind of knowledge of the “types” they provide.  Auto-complete on your own code (or someone else’s .py files, which is the critical thing here) seems to be missing.  So again we come back to:

  • Fast to write your own code, because you know your own types.
  • When working with someone else’s code directly it becomes much tougher because you have to actually inspect the .py files or read through an API reference.

This makes me wonder why Arelle went with Python.  I suspect it’s because they wanted to create XBRL tools first (quickly), and only after thought about exposing an object model.  Maybe I’m way off here, but it seems something like Java or C# would have been better tools for creating a complex object model that is meant for others to use.

Dynamic Typed Languages for Complex Object Models

Farewell tedp.net

I used to maintain a blog/forum/wall type thing at www.tedp.net.  It started when I was an undergraduate at University of Toronto Scarborough.  Each student was assigned something like 10MB of webspace to play with.  I used Perl and CGI to create a blog, and eventually created a very rudimentary user management system so that my classmates could post as well.  They could even submit a profile photo and attach pictures to their posts.  The page “went live” on March 21 2002 at 1:25pm according to the HTML I wrote at the time.  I suppose if I’d had any entrepreneurial ambition at the time I could have beat Facebook to the punch.  I eventually moved it to paid hosting when 10MB became a pretty impractical limit.

I decided recently to take the site down, mainly because it hasn’t been actively updated in years and I didn’t want to pay for the hosting anymore.  But I thought that I should maintain some record of it, as my classmates and I were pretty active in posting there for a few years.  I wrote a small C# program that would connect to the MySQL database that backed the site, download all the post information, and output an HTML document containing each post as it would have been rendered on the site.  A typical post looked something like this:

The HTML document I created was simply hundreds of these stacked on top of one another.  All good so far.  However, I also wanted to print the document to PDF, where page breaks suddenly became an issue.  Simply using Chrome’s “Save as PDF” functionality would result in documents that look like this:

badBreak

Notice how the break occurs in the middle of one “post”.  My first stab at fixing this used the CSS page-break-after property on a div placed between each post like so:

<div style="page-break-after: always;"></div>

This was slightly better, but now each post was on its own page. What I wanted was to suggest that a page break should occur there.  Turns out CSS doesn’t have a way to suggest a page break, but it does have a way to do the opposite, suggest against one.  I simply wrapped each post in a div with the page-break-inside property set as follows:

<div style="page-break-inside: avoid;">
...post goes here...
</div>

Result!

I also ran into some problems with unicode characters not rendering properly. I think because in the early days I used to edit my posts in Word then copy/paste them into the website I ended up with many unicode characters from the MS Windows Character Set in the posts.  They rendered with varying degrees of success but looked terrible in Chrome.  I had problems with left and right single and double quotes, long dashes and upside down question marks.  The reason for the last one is that for a time there was a fad amongst my group of friends for using upside down question marks instead of normal ones on instant messaging and websites like ted.net.  Chris Sorensen started it all.  This is the code I used to strip them out:

postText = postText.Replace("\u0091", "\u0027");
postText = postText.Replace("\u0092", "\u0027");
postText = postText.Replace("\u0093", "\u0022");
postText = postText.Replace("\u0094", "\u0022");

postText = postText.Replace("\u0096", "\u002D");
postText = postText.Replace("\u0097", "\u002D");

postText = postText.Replace("\u00BF", "\u003F");

The full PDF can be found here.  It’s in chronological order, which is the opposite of how it would have been presented on the website, but makes it easier to read as a document.  At first it’s just me, others join in after a few pages.

Farewell tedp.net