What’s in EveryPolitician’s data?

Short answer: names. And dates of birth, twitter handles, political group memberships, email addresses, honorific titles, image URLs, and all sorts of useful things — all provided we can find them. How complete this data is depends on the sources that people around the world have found for us for each country.

This is an overview of what you can find in our data. You might also want to read about the two data formats available (CSV or JSON), and how the data structure encapsulates entities and relationships.

The CSV files, which are often easier to work with, contain a useful distillation of what’s in the JSON. The fields that are used below to describe the data appear as column headings in the CSV; the same data will always be present in the JSON too, but in a more structured and often richer form. For example, where the CSV file may contain one name for a politician, the JSON data might also include variations under other_names, mapped by language code. The JSON often includes more detailed membership information too, such as start and end dates, and so on. It will usually contain URLs for the sources of our data.

Determine the actual names of the folders and files by looking in the data/ folder, or inside the index file, countries.json.

folders for each country
e.g., Algeria/, American_Samoa/, Albania/

folders for each legislature
e.g., Assembly/, House/, Senate/ (many countries only have one)

JSON file (Popolo format) of all data (terms, groups, persons)
e.g., ep-popolo-v1.0.json

CSV file for each term containing persons:
e.g., term-19.csv, term-2011.csv, term-1999-02-16.csv
(in practice, if there’s only one term available, it’ll typically be the current one)

CSV field	use & example value
id	unique within EveryPolitician for this politician (so you can use it to track people across terms, for example) It’s a UUID, and looks like `2bc9cc09-a33a-42d9-89c3-14effb20b8b0`
name	common name for this politician: `Anne Example` See more about names
sort_name	name for sort ordering: `Example, Anne`
email	email address: `anne@example.com`
twitter	twitter handle: `example` (for `@example`)
facebook	Facebook name: `AnneExample` (for `facebook.com/AnneExample)`
group	name of political group `Example Faction`
group_id†	id for group
area	name of area represented `Example County`
area_id†	id for area
term†	id or index for term
start_date	start of membership (if needed) `2010-11-29`
end_date	end of membership (if needed) `2012-07-14`
image	URL to image file `http://example.com/example.jpg`
gender	gender if known: currently `male` or `female`

† You can consider fields marked † as identifiers that are unique within this legislature (unlike id itself, which identifies the politician within all of EveryPolitician’s data). For example, all politicians in this legislature with the same area_id are representing the same area (which is what you’d expect). Where appropriate, the actual value might be a useful value. For example, for area_id we like to use Open Civic Data IDs like this: ocd-division/country:us/state:ca/cd:47, but if no such mapping is available from our sources, we’ll probably use a slug like area-foo-county. If you can suggest better values whenever you find us using our own ones, please get in touch!

Legislatures and terms

For every country, we break the data down into legislatures (for example, the “parliament” or “congress” of the country). Currently, we’re only working with national, that is, top-level, legislatures. In the future we might include local or state-level legislatures too.

Many countries have a single legislature. Some have two. For example, the United States of America has the House of Representatives and the Senate.

Furthermore, we split the data for those down into terms. Like legislatures themselves, the way this works may vary from country to country. Terms have a start and end date, and the data therefore describes the membership of the legislation during that period. In most democracies, new terms begin after every national election.

This is pragmatic because we know online services and sites are likely to be concerned with the country’s current legislature, which means the data for politicians in the current term. Often — but not always — this is also the data we’re most likely to be able to source online, so if you see a country only has one term in its data, it will probably be the most recent one. We’re always interested in adding historic data too (that is, data for previous terms), so if you know where we can get it for your country, please let us know. Similarly, if we are lacking the current term (maybe there’s just been an election?), please let us know — it might be that our source has been updated and we haven’t pulled down the latest changes.

Politicians

The richness of the data we have will depend on the source or sources from which we’re getting it. We use a wide variety of sources, some official and many more that are not. Indeed, the EveryPolitician project exists partly because in many places it’s still depressingly hard to get even basic information, such as who all the current legislators are.

So the absolute minimum data we need for a politician to be in EveryPolitician is their name (by implication, we also have their membership of the legislature and its term). Unless the system they are in does not use such mechanisms, we also try to have the area they represent, and the group (perhaps a party or faction) they belong to. There may also be start and end dates if we know they did not serve the full term.

About names

Names are a key part of any data set based around people. In the basic CSV, the name will be the common name that we’ve got from the source, but we’re aware that names are not quite that simple. So we sometimes have separate fields for:

name given_name family_name honorific_prefix honorific_suffix patronymic_name sort_name

The JSON data includes other_names, which may contain aliases as well as names for the politician in different languages.

If you only want names

For convenience, we isolate all the names of all the politicians within each legislature and make them available in their own file. (Like other files, look inside countries.json to find its path and filename; in this case under names). This file contains name and id fields. Note that there may be more than one name for each politician (for example, we might have the name of a Welsh politician rendered in English and Chinese). If you need to consolidate them, use the id to match them (or dive into the JSON data instead).

In fact, we’ve even built an external service that puts all these names (together with identifier fields, such as id) into a single CSV file (caution: that currently contains well over 80,000 names and growing): see everypolitician-names.

IDs to other data sets, including Wikidata

We give each politician in EveryPolitician a unique ID (actually a UUID). It’s keyed as id in the CSV. But often we’ll know useful IDs for the same politician in other data sets too. You can find these in the JSON data under identifiers.

Each identifier consists of an (identifier, scheme) pair. For example, here’s an entry for the identifier in Wikidata (the database behind Wikipedia):

"identifiers": [
  {
    "identifier": "Q3785077",
    "scheme": "wikidata"
  }
]

(Incidentally, you can view Wikidata entries like this: wikidata.org/wiki/Q3785077, or using the Reasonator. This example is Estonian politician Taavi Rõivas.)

We do this for other entities (such as legislatures themselves; and we’re adding external identifiers for political parties or factions too) as well as politicians. Remember that external data sets might not be complete, so it’s possible to have a useful identifier which isn’t populated for all records. For example, Wikidata only holds data for entries that satisfy their “notability” criteria.

Some legislatures have adopted the good practice of assigning unique IDs to their politicians, which can be useful when using their official APIs. If you know that your country’s legistature provides such IDs please tell us, and we’ll add them as identifiers in this way.