On the 12-13 November 2016 DataKind UK hosted a DataDive examining corporate ownership data. The event was run in partnership with Global Witness, a not-for-profit organisation that campaigns against environmental and human rights abuses that are often derived from the exploitation of natural resources and corruption in the global political and economic system; see the great video here for a better explanation. They carry-out investigations that expose these abuses and came to the DataDive in the hope that the data science volunteers could help them explore the the world’s first open data register of “beneficial owners” or “people with significant control” of companies registered in the UK.
Using date from OpenCorporates the goal of the DataDive was to explore the new "beneficial owners" data to see what it reveals about potential cases of tax evasion and corruption. To this end the ~30 or so volunteer data scientists split into 3 teams:
- Firstly, an analytics team to derive initial insights into the "beneficial ownership" and what it could say about companies within the UK.
- Combining this data with other data sets could the register yield new investigative leads.
- Could we map out the ownership network in such a way that we could query the data in new ways. I chose to work with Team 3, and ultimately we decided to try and build a graph database out of the data we had at hand. So before I spend the rest of this blog post explaining what we figured out, let me say that the other two teams came up with some fantastic results and taking a look at the DataKind blog post and Global Witness blog post is well worth your time. For starters it was found that 3,000 companies had listed their beneficial owner was a company with a tax haven address in contravention of the rules on declaring beneficial owners.
I do need to give a shout out to Juan and Ned, two friends also at the DataDive on Team 3, who spent the entire weekend trying to figure out how to create unique identifiers for people while also making sure to match the same person e.g. Mr Alfred Hitchcock is the same as Sir Alfred J. Hitchcock. Definitely not an easy task but they made valiant progress
Diving into building a new Neo4j database in <24 hours
Immediately upon looking at the data we had in hand it was clear that it could be well represented as a network and hence we thought that graph analytics might offer some new innovative approaches to exploring the data. I was one member of our team who took on the challenges of converting our sketch of behaviour into a useable graph database that could be easily queried and explored.
The fact that I and a friend Stew, who was also at the datadive had some experience with Neo4j and that there is a free community edition for experimenting we chose that as the tech basis for what we would build. As illustrated in the sketch above we would construct nodes of: people (owners); companies; countries; sectors (we didn't complete this part). The relationships would then connect these nodes together through ownership and location.
The goal ultimately was to build a system that enabled easy querying to find links within corporate structures. So rather than focus on technical details of how we ultimately built it all I thought I'd share some of the sorts of queries we ended up running and the patterns they revealed.
Visualising patterns of corporate ownership
The following are visualisations that have come directly from running CYPHER queries on the neo4j database that we built. For the case of these renders I've anonymised the individual people and companies as the purpose here is to demonstrate the sorts of structures and patterns that can be extracted. The colour of the nodes indicate their type:
- Companies are blue
- People are green
- Countries are red
- Postal codes are purple
These nodes are then joined via a variety of relationships including:
- CONTROLS - indicates a controlling relationship between a person or company and another company
- REGISTERED_IN - indicates the country or postal code where something is registered
- CITIZEN_OF - indicates citenship of people
Self-control: One of the goals of the "beneficial ownership" requirement is that you see which people and corporations ultimately own a company, so it was intriguing to find that a number of companies reporting that they were controlled by themselves. In all likelihood this is actually confusion in filing the new ownership information rather than actual attempts at obfuscating ownership.
Chains of control: It's quite easy to visualise the complexity of control amongst corporations by searching for some of the longest chains of control that exist in the database.
Large corporate structures: One of the largest of the chains above is a partial map of the healthcare company Reckitt Benckiser.
Looking at tax havens: Looking at connections to tax-haven countries is relatively straightforward as shown below in a network that shows people and UK companies linked back to the British Virgin Islands and the Cayman Islands.
Control complexity: We've already seen examples of complexity within companies owning other companies but this also extends to individual people's ownership/control. In the example below we can see an instance of one individual that has a lot of controlling interests in different companies but some of these companies also report controlling interests in other companies such that it it is not clear what the total controlling interest an individual has.
Mega-owners: One curiosity was to look for who owned the most companies in the UK and the result was somewhat surprising. There are several individuals who report controlling interests in hundreds of companies. However, the companies typically have a single share valued at £1.