Some notes on the XBRL data format, as I've been working with it this week. I'm working with Power to Change to help them extract data from accounts.
XBRL standards for eXtensible Business Reporting Language. iXBRL is the same but "inline". They describe a data standard for financial reporting, based on XML. In the UK it's used by Companies House and HMRC for reporting - instead of submitting a PDF or paper copy of their accounts (which contains no machine readable version of the data within) companies can (and in some cases must) submit in XBRL format.
The iXBRL format differs slightly from XBRL in that the machine-readable financial data is contained within a human-readable HTML file. That way you get the best of both words - documents that can be read by people interested in that company, with data that can be extracted in a consistent way.
That's the theory, anyway.
What's in XBRL?
In my mind, XBRL has two main parts:
- The XBRL format, describing how to represent various concepts within an XML or HTML document. The formats for XBRL and iXBRL are slightly different - you should be able to convert an iXBRL document to an XBRL one, but that wouldn't work the other way round.
- Schemas which are applied on top of the XBRL format to represent financial data in a specific framework - for example the FRS accounting framework or the Charity SORP.
The format describes what the data should look like, while the schema describes what each individual data item refers to.
How do you make XBRL?
This isn't a particularly relevant question for me - for this exercise I'm more interested in how to get data out of the documents.
But I think most XBRL documents are produced by standard accounting packages. There are also specialist firms like Hypercube Consultancy who provide tagging services for accounts. Stewart from Hypercube was kind enough to get in touch with me and share his experience of creating and using XBRL and some of the tools and resources out there.
What can you do with XBRL?
This is what I'm interested in - how do you get data out of XBRL documents. My aim in this is to look for particular values (total assets, for example) and compare them across different organisations.
This is trickier that I had thought. I had hoped that there would be a nice suite of (open source) tools for parsing and extracting data from XBRL - but as far as I can tell there isn't. I did know from trying beforehand that the flexibility of XBRL and iXBRL means it is very complex to deal with - often the same value can be represented in different ways depending on the choices made.
There are some resources out there though:
- Yeti - a tool created by CoreFiling, who I think were one of the original authors of XBRL. One issue I've had looking at XBRL is navigating the schemas, and this tool makes it much easier to do so.
- Sample iXBRL files - This seems to be the only github presence of XBRL International who look after the standard, but it's really useful for testing tools.
- Companies House - There's a few ways you can get real-life iXBRL files for companies:
- Via the Companies House register if a company has filed iXBRL accounts they're available for download on the filing history page for that company. Not all companies file iXBRL accounts so if they're not available then PDF copies are instead.
- You can also download accounts in bulk via the Free Accounts Data Product. This has daily and monthly dumps of any XBRL or iXBRL accounts filed with Companies House. This is useful for getting accounts to test on.
- You can also view the official specifications for the format - I found these very difficult to read and get a sense of what the resulting data looks like:
What have I done?
As these notes suggest, a lot of what I've done is trying to make sense of the landscape and explore the data. I did most of this through jupyter notebooks, looking through iXBRL files of a selection of community businesses. I've found the python BeautifulSoup library helpful for parsing the XML.
Using what I'd learnt from exploring the iXBRL files I've started to put together a python module for parsing iXBRL and extracting the data. I'll be aiming to develop this further, and hopefully use it in anger: https://github.com/drkane/ixbrl-parse.
Who else is doing things with XBRL
I did a search of GitHub to find XBRL repositories. There were a few people who'd tried to do similar things over the years. But most interestingly, the ONS Big Data Unit has an active repository with some code, approaching it in a very similar way to me. I've got in touch with them to see if they want to compare notes.