Where did this data come from?

Short answer: we combine data from as many useful online sources as you can tell us about, official and unofficial. Longer (technical) answer: if you’re happy to read code, you can look in the file instructions.json that EveryPolitician uses to retrieve and collate the data — see more about this below.

We list the sources we’re using at the bottom of each term’s page (for example, see “Main sources” right at the bottom of the data for the 44th term of Australia’s House of Representatives).

Sometimes the data changes, of course. So we regularly rebuild the data from those sources to keep EveryPolitician up to date.

About those sources

We aggregate from lots of difference online sources, such as official parliament sites and unofficial sites (including Wikidata, which is the database on which projects like Wikipedia are based). If you know of a good source that we’re not using: let us know!

We merge our data from multiple sources because it's common for different sources to provide different kinds of data (for example, one source might have politicians' dates of birth, while another has their Twitter handles).

If any of the sources themselves have clear, consistent IDs, we try to capture those (and include them in the identifiers field within the JSON), because we know that sometimes it can be helpful to be able to map back to the original data sets.

The data sources available vary immensely from country to country. And the best people to ask for the best data sources are the locals: so if you know of a good source that we’re not using in your country, let us know! Just pointing out a source to us is helpful; you don’t have to do the hard work of actually extracting the data.

The technical details

You can see exactly where the data’s coming from by looking in the EveryPolitician data repo. Specifically, you want the sources directory for the legislature you’re interested in. Look inside the instructions.json there because that is the file EveryPolitician uses to rebuild its data whenever something changes.

For example, the instructions (containing the explicit sources as well as indications of how to process them) that EveryPolitician follows for putting together its data for Australia’s House of Representatives are in this instructions.json file.

Amongst other things, that file tells you the type of data it’s getting as well as URL of the resource. It’s common for the resource itself to be the output of a process that is getting data from the “raw” source — for example, in the case of Australia, two of the sources (one for determining the terms available, and another for the politicians’ names) are the output of a single webscraper running here: morph.io/tmtmtmtm/australia-openaustralia. If you look at the scraper's source code, you can see that the scraper itself is getting data from OpenAustralia's data site.