Dag van de normalisatie: Leeftijd bij het CBS

Vandaag is het de dag van de normalisatie. Simpel gezegd houdt normalisatie in dat er normen afgesproken zijn, zodat bijvoorbeeld alle electrische apparaten in Nederland (en veel andere landen) die werken, rekeninghoudend met voltage en al dat soort technische zaken en natuurlijk een passende stekker hebben.

Helaas zijn er verschillende normen die gehanteerd kunnen worden. Zo is het stopcontact in het Verenigd Koninkrijk net anders dan die in Nederland. Maar door de normen, kunnen er gelukkig wel weer verloopstekkers gemaakt worden om dit op te lossen. Vervelender vind ik de wildgroei aan stekkers en voltages voor laders (en accu’s) van laptops. In al die jaren heb ik nog geen enkele keer een lader van een eerdere laptop kunnen gebruiken. Ook de kans dat je iemand zijn lader kan gebruiken is behoorlijk klein.

Ook bij het doen van onderzoek is normalisatie erg belangrijk. Zonder normalisatie is het lastig om data van het ene onderzoek te vergelijken of te gebruiken bij ander onderzoek. Ik heb de proef op de som genomen bij de indeling van leeftijd zoals die door het CBS gehanteerd wordt. Ik heb op http://statline.cbs.nl gezocht op leeftijd en van de eerste 20 resultaten gekeken welke leeftijdsindeling gebruikt wordt/kunnen worden in de rapportage. Ik heb er 12 verschillende gevonden. Deels is dit wel logisch. De leeftijd bij onderzoek onder basisschoolkinderen is anders opgebouwd dan bij onderzoek naar de werkende bevolking. Maar anderen zijn vrij onhandig gekozen. Het combineren met andere data wordt dan vrij lastig. Vaak zijn wel andere indelingen aanwezig (5 jaarsgroep of 10 jaarsgroep). Maar de indeling die gehanteerd wordt bij Leefstijlonderzoek is erg afwijkend en slecht vergelijkbaar met andere onderzoeken:

  • 4-12 jaar (8 jaarsgroep)
  • 12-16 jaar (4 jaarsgroep)
  • 16-20 jaar (4 jaarsgroep)
  • 20-30 jaar (10 jaarsgroep)
  • 30-40 jaar (10 jaarsgroep)
  • 40-50 jaar (10 jaarsgroep)
  • 50-55 jaar (5 jaarsgroep)
  • 55-65 jaar (10 jaarsgroep)
  • 65-75 jaar (10 jaarsgroep)
  • 75 jaar of ouder

Quality of IATI Organisation Identifier

One of the big advantages of IATI activity data is that it is possible to find information about a specific organisation. Questions like:iati-logo

  • In which countries or sectors is an organisation active?
  • What roles does an organisation have?
  • How many activities is an organisation involved in?
  • What kind of activities is an organisation involved in?
  • With which organisation is a organisation working?

The problem

To merge information from different publishers you can use the organisation name, but that is (to put it midly) risky.  Organisations could be know with different name or spelling (“World Bank” and “The World Bank” are two different names, but I guess they are the same organisation; while two activities with “Freedom Forum” could be with two different organisations). Therefore the attribute “ref” is available in the IATI standard, containing the IATI organisation identifier.  This precise format of this identifier is described here.

To be usefull this identifier must be available and correct. To test this we have harvest every organisation name with its refs from all published IATI files.


  • In total we find 17987 different organisations.
  • We have 13224 organisation with an organisation identifier (sounds not bad) and 4763 organisations without this identifier.
  • However we have only 994 organisations with a valid organisation identifier (of which 677 identifiers are unique (so we have different organisation names with the same identifier))
  • And we have 16993 (94,5%) organisations without a valid organisation identifier (of which 15867 organisation names are unique)

The solution

We are developing a service to verify and find suggestions for the organisation identifiers. We hope to have this available in January 2015


Technical background:

  • An organisation is unique based on organisation name and organisation identifier. So if, say organisation “world bank” is used at one activity with a correct identifier and at another activity without identifier it are two different organisations.
  • The iati files as were known on 15 December 2014 were used for the results
  • We only checked if the organisation identifier was well-formed. We did not check if the ‘base identifier’ was correct. Valid if:
    • On codelist with organisation names of version 1.04
    • or of the format registryCode_someIdentifier

IATI identifier (IATI reporting: It’ all about identifiers)

The IATI identifier should be the key to the informationOne of the identifier in выборе the IATI Standard is the activity identifier called It’s IATI wholesale NFL jerseys identifier. It identifies an activity. It is useful because multiple reporting organisations could report about the same activity. By referencing to the activity of another reporting organisation, you could match these reports and determin how for instance the money flows.

According wholesale NBA jerseys to the guidelines, it should:

  • exist
  • be unique
  • start with the identifier of the reporting organisation
How good or bad is the use of the IATI identifier

At this moment (actually yesterday), we were able to import 429,675 activities from 226 reporting organisations (some cheap NFL jerseys files are Mallorca not valid/doesn’t exist/were not g_bus_own_name连接总线失败 downloadable, see also http://bjwebb.github.io/IATI-Dashboard/index.html).

  • 4866 Top activities don’t have an identifier
  • 73125 activities don’t have an unique identifier
  • 206198 activities have an identifier which does not  start with the reporting organisation identifier
  • This combined: 213843 activities (about 50%) has an iati identifier which matches the guidelines (and 215832 who don’t)

Remarkable is that 4829 activities doesn’t have a reporting organisation identifier.

How many reporting organisations are good

170 of 226 reporting organisations have correct activity identifiers.  So 56 organisations doesn’t.

However these results are worse. Marks I assume the reporting organisation identifier is valid. Which is not always true. But that is for another post.


(earlier posted on nivocer.com)

IATI reporting: It’s all about identifiers

validA while wholesale mlb jerseys ago, I started investigating the IATI data. The International Aid Transparency Initiative (IATI) makes information about aid spending easier to access, use and understand. Different stakeholders (Ministries of foreign affairs, NGO, etc) publish information about budget, spendings, participants, results and more about their cheap nfl jerseys activities. It is useful to look at Will the data of an individual organisation, but if you aggregate the data of all organisations, you can answer many more (and more interesting?) questions like:

  • Which organisations are active in Mali
  • How much money PROFESSIONAL does organisation X receive (and spend)
  • Which cheap jerseys organisation is the most successful considering budget spending, Summer results, etc.

Very important if you want to compare or aggregate results. Organisations needs to report on a consistent manner. The IATI Standard defines guidelines and code lists to support this. However the data contains a lot of violations against these guidelines. Blog In this series I want to report on these violations.