Adding charity details using findthatcharity - Part II

...continued from part 1

Part II - add charity details

Once you've got a list of organisations in OpenRefine with charity numbers you can then add more details about the organisations using findthatcharity.uk. You can add:

  • postcode
  • website
  • latest income
  • link to Charity Commission register
  • date registered/removed
  • Company Number

1: Fetch JSON data

First you fetch data from the findthatcharity page for that charity. We'll be getting the data in "JSON" format - this means we can get the fields we need out of the data.

To fetch the data, click the arrow menu next to the column with the charity numbers and select Edit Column > Add column by fetching URLs....

Screenshot of OpenRefine option for adding column based on an url

In the resulting dialog you need to add a formula which will create the URL that needs to be fetched. This means adding the charity number value into a template. The formula you need is:

"https://findthatcharity.uk/charity/" + value + ".json"

Make sure you include the quotation marks in the correct place. Add a new column name (something like "Charity Details"). The "Throttle delay" setting helps to lessen the load on the site being fetched from - the default value is 5000 milliseconds (5 seconds). Findthat charity can cope with more requests than that (within reason!) So I'll generally set it to 50-100 milliseconds, which works out as between 10 and 20 requests per second.

The resulting dialog should look something like the following. You'll see a preview of the url which should look sensible. You can test that the URL is correct by visiting one of the preview URLs. You'll also see that columns with a null value also produce a null URL so won't be fetched.

Preview of the Add column by fetching from URL dialog

2. Extract details from the JSON data

You should see a new column filled with JSON data - it looks very hard to read, but all the data we'll looking for. You'll need to get OpenRefine to look through this data and extract the fields you're looking for.

I'll demo this with the latestincome and company number fields, but the same principle can be used to extract anything from the data.

Use the arrow menu next to the new column, and then select Edit Column > Add column based on this column... Again, we'll enter a formula to parse the JSON and find the right value within. We're using the parseJson OpenRefine function. Enter the following formula:

parseJson(value)["latest_income"]

Enter a name for the column and press OK to add it.

Dialog for adding a new column by parsing JSON

Company number

Adding a company number is slightly more complicated. This is because the company_number field can contain multiple values (if a charity is linked with more than one company). So the formula we'll use in this instance is:

parseJson(value)["company_number"][0]["number"]

The [0] part selects the first value within the list of values of company_number, and then ["number"] selects the company number within that.

3. Save the data

While OpenRefine is useful for this kind of data cleaning and linking, you'll want to export the data to another programme to perform analysis. You can do this using the Export menu found in the top right hand corner. The default formats are pretty straightforward, and it's possible to create more sophisticated exports (including a JSON export) using the Custom tabular exporter... option.

One thing to remember with export - it will use any active filters when exporting, so make sure you don't have restrictive filters that exclude data you actually need.

4. Optional - using Charity Base

The data found on findthatcharity.uk is minimal as it is only included to support the site's main purpose of being a search engine. If there are other details you need then you may need to look for other data sources.

A good place to look is CharityBase which has more detailed data extracted from the Charity Commission.

A sample formula for getting a CharityBase URL for extracting data might look like:

"https://charitybase.uk/api/v0.2.0/charities?subNumber=0&charityNumber=" + value + "&fields=mainCharity.income,registration,beta.people.employees"

(Note that you need to specify the fields you want to return - here we're looking for the number of employees).

And the sample formula for parsing the resulting JSON data might look like:

parseJson(value)["charities"][0]["beta"]["people"]["employees"]