Short answer: we combine data from as many useful online sources
as you can tell us about, official and unofficial.
Longer (technical) answer: if you’re happy to read code, you can look in
the file instructions.json
that EveryPolitician uses to retrieve
and collate the data — see more about this below.
We list the sources we’re using at the bottom of each term’s page (for example, see “Main sources” right at the bottom of the data for the 44th term of Australia’s House of Representatives).
Sometimes the data changes, of course. So we regularly rebuild the data from those sources to keep EveryPolitician up to date.
We aggregate from lots of difference online sources, such as official parliament sites and unofficial sites (including Wikidata, which is the database on which projects like Wikipedia are based). If you know of a good source that we’re not using: let us know!
We merge our data from multiple sources because it's common for different sources to provide different kinds of data (for example, one source might have politicians' dates of birth, while another has their Twitter handles).
If any of the sources themselves have clear, consistent IDs, we try to
capture those (and include them in the identifiers
field within
the JSON), because we know that sometimes it can be helpful to be able to map
back to the original data sets.
The data sources available vary immensely from country to country. And the best people to ask for the best data sources are the locals: so if you know of a good source that we’re not using in your country, let us know! Just pointing out a source to us is helpful; you don’t have to do the hard work of actually extracting the data.
You can see exactly where the data’s coming from by looking in the
EveryPolitician
data repo. Specifically, you want the sources
directory for
the legislature you’re interested in. Look inside the
instructions.json
there because that is the file EveryPolitician
uses to rebuild its data whenever something changes.
For example, the instructions (containing the explicit sources as well as
indications of how to process them) that EveryPolitician follows for putting
together its data for Australia’s House of Representatives are in this
instructions.json
file.
Amongst other things, that file tells you the type of data it’s
getting as well as URL of the resource. It’s common for the resource
itself to be the output of a process that is getting data from the “raw”
source — for example, in the case of Australia, two of the sources
(one
for determining the terms available, and
another
for the politicians’ names) are the output of a single webscraper running
here:
morph.io/tmtmtmtm/australia-openaustralia
.
If you look at
the scraper's source code,
you can see that the scraper itself is getting data from
OpenAustralia's data site.