Part II - add charity details
Once you've got a list of organisations in OpenRefine with charity numbers you can then add more details about the organisations using findthatcharity.uk. You can add:
- postcode
- website
- latest income
- link to Charity Commission register
- date registered/removed
- Company Number
1: Fetch JSON data
First you fetch data from the findthatcharity page for that charity. We'll be getting the data in "JSON" format - this means we can get the fields we need out of the data.
To fetch the data, click the arrow menu next to the column with the charity numbers and
select Edit Column > Add column by fetching URLs...
.
In the resulting dialog you need to add a formula which will create the URL that needs to be fetched. This means adding the charity number value into a template. The formula you need is:
"https://findthatcharity.uk/charity/" + value + ".json"
Make sure you include the quotation marks in the correct place. Add a new column name (something like "Charity Details"). The "Throttle delay" setting helps to lessen the load on the site being fetched from - the default value is 5000 milliseconds (5 seconds). Findthat charity can cope with more requests than that (within reason!) So I'll generally set it to 50-100 milliseconds, which works out as between 10 and 20 requests per second.
The resulting dialog should look something like the following. You'll see a preview of the
url which should look sensible. You can test that the URL is correct by visiting one of the
preview URLs. You'll also see that columns with a null
value also produce a null
URL
so won't be fetched.
2. Extract details from the JSON data
You should see a new column filled with JSON data - it looks very hard to read, but all the data we'll looking for. You'll need to get OpenRefine to look through this data and extract the fields you're looking for.
I'll demo this with the latestincome
and company number
fields, but the same principle
can be used to extract anything from the data.
Use the arrow menu next to the new column, and then select Edit Column > Add column based on this column...
Again, we'll enter a formula to parse the JSON and find the right value within. We're using
the parseJson
OpenRefine function. Enter the following formula:
parseJson(value)["latest_income"]
Enter a name for the column and press OK to add it.
Company number
Adding a company number is slightly more complicated. This is because the company_number
field can contain multiple values (if a charity is linked with more than one company).
So the formula we'll use in this instance is:
parseJson(value)["company_number"][0]["number"]
The [0]
part selects the first value within the list of values of company_number
, and
then ["number"]
selects the company number within that.
3. Save the data
While OpenRefine is useful for this kind of data cleaning and linking, you'll want to
export the data to another programme to perform analysis. You can do this using the
Export
menu found in the top right hand corner. The default formats are pretty
straightforward, and it's possible to create more sophisticated exports (including a JSON
export) using the Custom tabular exporter...
option.
One thing to remember with export - it will use any active filters when exporting, so make sure you don't have restrictive filters that exclude data you actually need.
4. Optional - using Charity Base
The data found on findthatcharity.uk is minimal as it is only included to support the site's main purpose of being a search engine. If there are other details you need then you may need to look for other data sources.
A good place to look is CharityBase which has more detailed data extracted from the Charity Commission.
A sample formula for getting a CharityBase URL for extracting data might look like:
"https://charitybase.uk/api/v0.2.0/charities?subNumber=0&charityNumber=" + value + "&fields=mainCharity.income,registration,beta.people.employees"
(Note that you need to specify the fields you want to return - here we're looking for the number of employees).
And the sample formula for parsing the resulting JSON data might look like:
parseJson(value)["charities"][0]["beta"]["people"]["employees"]