Short answer: names. And dates of birth, twitter handles, political group memberships, email addresses, honorific titles, image URLs, and all sorts of useful things — all provided we can find them. How complete this data is depends on the sources that people around the world have found for us for each country.
This is an overview of what you can find in our data. You might also want to read about the two data formats available (CSV or JSON), and how the data structure encapsulates entities and relationships.
The CSV files, which are often easier to work with, contain a useful
distillation of what’s in the JSON. The fields that are used below to
describe the data appear as column headings in the CSV; the same data
will always be present in the JSON too, but in a more structured and
often richer form. For example, where the CSV file may contain one
name
for a politician, the JSON data might also include
variations under other_names
, mapped by language code.
The JSON often includes more detailed membership information too,
such as start and end dates, and so on. It will usually contain URLs
for the sources of our data.
Determine the actual names of the folders and files by
looking in the data/
folder, or inside the
index file, countries.json
.
Algeria/
, American_Samoa/
, Albania/
Assembly/
, House/
, Senate/
(many countries only have one)
ep-popolo-v1.0.json
term-19.csv
, term-2011.csv
, term-1999-02-16.csv
CSV field | use & example value |
id |
unique within EveryPolitician
for this politician (so you can use it to track
people across terms, for example)
It’s a UUID, and looks like 2bc9cc09-a33a-42d9-89c3-14effb20b8b0
|
---|---|
name |
common name for this politician: Anne Example
See more about names |
sort_name | name for sort ordering: Example, Anne |
email address: anne@example.com |
|
twitter handle: example (for @example ) |
|
Facebook name: AnneExample (for facebook.com/AnneExample)
| |
group | name of political group Example Faction |
group_id† | id for group |
area | name of area represented Example County |
area_id† | id for area |
term† | id or index for term |
start_date | start of membership (if needed) 2010-11-29 |
end_date | end of membership (if needed) 2012-07-14 |
image | URL to image file http://example.com/example.jpg |
gender | gender if known: currently male or female |
†
You can consider fields marked † as identifiers that are
unique within this legislature (unlike id
itself, which
identifies the politician within all of EveryPolitician’s data). For
example, all politicians in this legislature with the same
area_id
are representing the same area (which is what
you’d expect). Where appropriate, the actual value might be a useful
value. For example, for area_id
we like to use Open
Civic Data IDs like this:
ocd-division/country:us/state:ca/cd:47
, but if no such
mapping is available from our sources, we’ll probably use a slug like
area-foo-county
. If you can suggest better values
whenever you find us using our own ones, please get in touch!
For every country, we break the data down into legislatures (for example, the “parliament” or “congress” of the country). Currently, we’re only working with national, that is, top-level, legislatures. In the future we might include local or state-level legislatures too.
Many countries have a single legislature. Some have two. For example, the United States of America has the House of Representatives and the Senate.
Furthermore, we split the data for those down into terms. Like legislatures themselves, the way this works may vary from country to country. Terms have a start and end date, and the data therefore describes the membership of the legislation during that period. In most democracies, new terms begin after every national election.
This is pragmatic because we know online services and sites are likely to be concerned with the country’s current legislature, which means the data for politicians in the current term. Often — but not always — this is also the data we’re most likely to be able to source online, so if you see a country only has one term in its data, it will probably be the most recent one. We’re always interested in adding historic data too (that is, data for previous terms), so if you know where we can get it for your country, please let us know. Similarly, if we are lacking the current term (maybe there’s just been an election?), please let us know — it might be that our source has been updated and we haven’t pulled down the latest changes.
The richness of the data we have will depend on the source or sources from which we’re getting it. We use a wide variety of sources, some official and many more that are not. Indeed, the EveryPolitician project exists partly because in many places it’s still depressingly hard to get even basic information, such as who all the current legislators are.
So the absolute minimum data we need for a politician to be in EveryPolitician is their name (by implication, we also have their membership of the legislature and its term). Unless the system they are in does not use such mechanisms, we also try to have the area they represent, and the group (perhaps a party or faction) they belong to. There may also be start and end dates if we know they did not serve the full term.
Names are a key part of any data set based around people. In the
basic CSV, the name
will be the common name that we’ve
got from the source, but we’re aware that names are not quite that
simple. So we sometimes have separate fields for:
name
given_name
family_name
honorific_prefix
honorific_suffix
patronymic_name
sort_name
The JSON data includes other_names
, which may contain
aliases as well as names for the politician in different languages.
For convenience, we isolate all the names of all the politicians
within each legislature and make them available in their own file.
(Like other files, look inside countries.json
to find
its path and filename; in this case under names
). This
file contains name
and id
fields. Note that
there may be more than one name for each politician (for example, we
might have the name of a Welsh politician rendered in English and
Chinese). If you need to consolidate them, use the id
to
match them (or dive into the JSON data instead).
In fact, we’ve even built an external service that puts all these
names (together with identifier fields, such as id
) into
a single CSV file (caution: that currently contains well over 80,000
names and growing): see
everypolitician-names.
We give each politician in EveryPolitician a unique ID (actually a
UUID). It’s keyed as id
in the CSV. But often we’ll know
useful IDs for the same politician in other data sets too. You can
find these in the JSON data under identifiers
.
Each identifier consists of an (identifier
,
scheme
) pair. For example, here’s an entry for the
identifier in Wikidata (the database behind Wikipedia):
"identifiers": [ { "identifier": "Q3785077", "scheme": "wikidata" } ]
(Incidentally, you can view Wikidata entries like this: wikidata.org/wiki/Q3785077, or using the Reasonator. This example is Estonian politician Taavi Rõivas.)
We do this for other entities (such as legislatures themselves; and we’re adding external identifiers for political parties or factions too) as well as politicians. Remember that external data sets might not be complete, so it’s possible to have a useful identifier which isn’t populated for all records. For example, Wikidata only holds data for entries that satisfy their “notability” criteria.
Some legislatures have adopted the good practice of assigning unique IDs to their politicians, which can be useful when using their official APIs. If you know that your country’s legistature provides such IDs please tell us, and we’ll add them as identifiers in this way.