How We Import Data

EveryPolitician exists to share politicians' data — but before we can put it out, we have to get it in. Our general requirements for importing data are described on this page.

Read on if you have a data set you think we could use, or if you're one of the wonderful people who's thinking of writing a scraper for us.

Our preferred format for data that we are importing is simple comma-separated value (CSV).

At a minimum this should have fields for:

  • id, or identifier__xxx: a unique identifier for the politician

    We add a unique id to every politician we import into EveryPolitician (it's a UUID, and if we know two records are for the same politician, we give them both the same id). If you provide your own id, we can use it to identify the politician during the import, but it will be overwritten with ours.

    However, if that ID would be useful in the EveryPolitician data because it maps to some external data set, you can make it persist by using the name identifier__xxx (note: two underscores), where xxx indicates the local ID scheme it is from. Our import mechanism treats this like another id, that is, recognises that it's a unique identifier (albeit perhaps with local scope, such as its country).

    For example, in the UK, parliament allocates a unique ID to all politicians within their own system, called PIMS, so when we import that data, we add that ID as identifier__pims in the CSV to import. That will be combined with other IDs from other sources (if any — in the example below, we've added Wikidata too, as identifier__wikidata), and ultimately it appears in the JSON as an (identifier, scheme) pair. Here's an extract from a JSON Popolo file showing how it ultimately appears within a person's entry:

    "id": "e09079a7-6609-4fe8-93b4-bd499637e130",
    "identifiers": [
      {
        "identifier": "4734",
        "scheme": "pims"
      },
      {
        "identifier": "Q258473",
        "scheme": "wikidata"
      }
    ],
          

  • name: their name
  • area: the constituency/district they represent (if appropriate)
  • group: the party or faction they’re part of (if appropriate)
  • term: the legislative period this membership represents (e.g. ‘19’ for the Nineteenth Assembly)
  • start_date: if the person joined later than the start of the term
  • end_date: if they left before the end of the term
  • source: the URL of the main source for this information (for example, if this data has been scraped from a single page or API call for this person then that's ideal; but if it's just an entry on a page listing many politicians, then that page's URL is fine)

If you have data for multiple legislative periods (and the more the better!) these can either be included in the same file, or provided in a single file per per term.

If someone changed party/faction affilation in the middle of the term, you should include two entries, with the relevant start/end dates set. For example:

id name area group term start_date end_date source
1681 Joe Smith Easthill White Party 19 2011-04-14 http://example.com/jsmith
1681 Joe Smith Easthill Black Party 19 2011-04-14 http://example.com/jsmith

Other fields we can automatically process include:

  • given_name
  • family_name
  • honorific_prefix
  • honorific_suffix
  • patronymic_name
  • sort_name
  • email
  • phone
  • fax
  • cell
  • gender
  • birth_date
  • death_date
  • image
  • summary
  • national_identity
  • twitter
  • facebook
  • blog
  • flickr
  • instagram
  • wikipedia
  • website

Ideally you would publish this file somewhere online, and keep it updated with changes, but if that’s too much work, we’ll happily accept a one-time file (perhaps with occasional later updates). Either way, we’re happy to chat first about what will be easiest for both you and us! Just drop us an email to team@everypolitician.org, and let’s get more data released!