By David Kane

  1. Blocking bots by UserAgent in dokku using nginx

    The problem A website I ran was being aggresively scraped by someone using a particular UserAgent string. It wasn't a normal search engine or similar bot (e.g. GoogleBot) so adding something to robots.txt wasn't going to work. This site is using dokku to serve the web app, dokku …
  2. Where to find geographic boundaries for the UK

    Geographic boundary data is a requirement for any geographic analysis or map-making. But finding boundary data is often a nightmare. This is a problem I've faced too, so I thought it might be useful to outline some of the places to look for boundary data: ONS Open Geography Portal This …
  3. What powers charity websites?

    I wanted to understand what technologies charities are using with their websites. What CMSs do they use to create content and layout the sites? What web technologies are they using - to what extent have the embraced the latest tech on their sites? And how well are they complying with some …
  4. Better horizontal bar charts with plotly

    I often find that horizontal bar charts are a great way of visualising comparisons between different categories. They're easy to understand and make, and provide a really simple way of displaying data. But I've found the default way of labelling them often doesn't make sense. Labels for the bars are …
  5. How many charity employees have been furloughed?

    This blog post is a write-up of a twitter thread exploring data available. These are very rough estimates, based on data that's available at the time of writing. It's frustratingly difficult to find out how many charities have used the government's Coronavirus Job Retention Scheme, or how many employees have …
  6. Working with XBRL

    Some notes on the XBRL data format, as I've been working with it this week. I'm working with Power to Change to help them extract data from accounts. What's XBRL/iXBRL? XBRL standards for eXtensible Business Reporting Language. iXBRL is the same but "inline". They describe a data standard for …
  7. Text analysis in the voluntary sector

    Last week at the Data 4 Good event in Birmingham I gave a talk on the opportunities and challenges that text analysis offers to the voluntary sector. It was a short lightning talk so I was only able to give a quick overview of the possibilities, and it's a big …
  8. A map of World Cup stadia using wikidata

    Wikidata is an amazing project that aims to turn the unstructured text of Wikipedia into a database of facts and figures that allows you to go beyond just presenting a page about something to using data about it. I've been wanting to try out using it, and "SPARQL", the language …
  9. Grants in Kingston

    On Tuesday I was lucky enough to take part in a Kingston "Data Day", organised by the fantastic Superhighways and Kingston Voluntary Action. The aim of the day was to showcase some of the work that charities in Kingston are already doing to use data in their work, and see …
  10. Adding charity details using findthatcharity - Part II

    ...continued from part 1 Part II - add charity details Once you've got a list of organisations in OpenRefine with charity numbers you can then add more details about the organisations using findthatcharity.uk. You can add: postcode website latest income link to Charity Commission register date registered/removed Company Number …
  11. Back in the GDPR

    G-day (the 25 May) is fast approaching, when charities will need to be GDPR-compliant. While half of charities apparently haven't heard of it there's no shortage of resources so I've put together a list of the resources I've seen produced on github. If you see anything missing then please add …
  12. Acrostic football league tables

    Following from a question to the Guardian's "The Knowledge" column I've tried to find a longer acrostic than "TABLE". “Has there ever been a longer acrostic spelled out in a table than the oft-recurring ‘TABLE’ from the Premier League this season – or ‘LAWNS’ from League Two?” asks Marco Jackson. To …
  13. Names shared by genders

    Building the gender classifier I've written about here got me interested in ambiguous names - those that are shared by people of both genders. I realised I could use the list of male and female charity trustee names I'd gathered to look into this in a bit more detail. Bearing in …
  14. A name gender classifier

    Something I've needed to do a couple of times is take a long list of names and classify them into male and female. For example, I've looked at lists of people who attended events to see whether they were reaching more men or women - this then helps target future events …
  15. Twitter bot – random charity

    I’ve been playing about with twitter bots, and following some instructions on using the twitter API in python, I’ve created a bot that tweets a link to a random charity every half an hour. The code and more details on the bot can be found on github. From …
  16. Development version of WordPress on Windows 10

    The problem: I need a local development version of WordPress to help test problems with the live site and also test updates. Previously I’ve used WAMP but that tends to get a bit messy, particularly on Windows 10. The solution: use docker to create a container with all the …