Tech Nation 2016 – Part 2 Part 2 of this series takes a look at the underlying data and methods used in the report. Chris DymondReports24th February 2016digital tech industryeconomicsTech Nation In this post I’m going to look at how the 2016 Tech Nation Report has been compiled and how it’s tried to address some of the shortcomings in other reports that try to analyse the digital industry. This is the second of several posts I’m writing about the report, please see the first post in the series for background to the report and it’s headline findings. I am going to keep it simple though – don’t worry. This is an overview of the methods used, with some commentary. You can read more detail about the methodology on pages 116-119 of the report itself. The Tech City UK researchers used a number of different data sets to try to address specific questions posed by the report – I’ll run through these briefly with comments at the end of this article, however the three most significant areas of definition are: geography, employment & industry scope, and economic data – so I’ll discuss each of these in order first. Geography The basis for the geographies used by the report (i.e. the clusters) are the 2001 Travel to Work Areas (TTWAs) produced by the Office for National Statistics. The report on this was produced in 2007 using data from the 2001 census. (The 2011 census would be more up to date, but the geographies based on that set weren’t released until August last year and aren’t yet reflected in other data sets the report uses). In addition to these, the report also used on-the-ground guidance from its partner organisations to combine three specific locations into single clusters: Bristol & Bath; Cardiff & Swansea; and Bournemouth & Poole. The report doesn’t say this explicitly, but I think it’s apparent that there’s a certain amount of political consideration in the selection of the clusters. For instance, the smallest cluster in the list is Truro, Redruth & Camborne (one of my old stomping grounds), with just 1,380 people employed in tech, while cities like Nottingham, Swindon/Newbury and Aberdeen, whose tech industries I suspect have a greater economic impact, are omitted. I have no data on this, though – it would be great to see how all the tech industry data falls into the TTWA geographies. Which brings us to… Employment & Industry Scope Nesta The main source of data used to determine levels of employment in the digital industries is taken from the work done by Nesta and TechUK published in the report Dynamic Mapping of the Information Economy, January 2016. In a nutshell, this report attempts to calculate a ‘digital intensity’ value for all industry sectors covered by Standard Industrial Classification (SIC) codes. To do this, it took the information economy occupations identified by the Department for Business Innovation and Skills, defined as Standard Occupational Classification (SOC) codes, and mapped the proportion of these occupations to other occupations within each industry. All industry sectors that showed a total employment figure higher than 10,000 people and a proportion of digital occupations higher than 30% were classified as being part of the information economy. This filter is used as a baseline, with a bunch of edge-cases and anomalies then considered individually. The Nesta report is well worth looking at in addition to Tech Nation, as it provides the best view of the progressive digitisation of the UK economy so far produced. This work provides the Tech Nation report with a more robust set of SIC and SOC codes with which to estimate employment, turnover and digital GVA figures (see the methodology section of the report to see how these are derived), certainly more robust than the KPMG Tech Monitor report that I looked at a few months ago. GrowthIntel However, in addition to the Nesta data, Tech City UK also worked with GrowthIntel to create a ‘Core Business Dataset’ that employs a completely different technique to scope the industry. In essence, Growth Intel scraped the websites of over 300,000 businesses in the UK, then assigned 10% of those to a manual process that assigned each to one of 44 sector categories (not SIC codes). They then used a machine learning algorithm to assign the other 90% based on probability derived from analysis of the scraped textual content. In addition, they used text clustering on this data to identify clouds of common phrases, and manually identified those that indicated the provision of digital services or products. This analysis revealed around 500 recurring ‘tags’, 228 of which related to industrial activity, and 44 of those were ‘digital’. Based on this 58,000 businesses were identified as ‘digital’ out of a total of 320,000. This analysis was also used to identify sub-sectors, and thus the sub-sector specialisms of each of the clusters contained in the report. Economic Data Economic data is surprisingly difficult to evaluate, even if you know which firms you are interested in, as the primary source of data – Companies House – does not require firms with fewer than 50 employees to publish their full accounts. To compensate for this, the Tech Nation researchers used data from a number of other sources, including the Annual Population Survey, the Business Structure Database and the Annual Business Survey. (Interestingly they did not use DueDill‘s data which was used in the 2015 report, I’m not sure why.) Other data sets In addition to these main pieces of analysis, the report also uses the following: Online job advertisement data from Burning Glass, which they used to estimate advertised average salary and job growth. Online software development activity from GitHub, which they used to estimate which coding competencies are prevalent in each cluster. Industry networking data from Meetup.com, which they used to evaluate the relative tech community activity in each cluster. Personally, I would take these last three pieces of analysis with a fair amount of salt, as we know from our own tech ecosystem in Sheffield that much of the employment activity is conducted outside of official recruitment; most open source coding is conducted by a very small proportion of the total community of programmers; and Meetup.com is a commercial service that fewer than a third of our local tech communities use to coordinate and promote their activity. In fairness the report doesn’t give these analyses much prominence, and I do think it’s a good idea to start using some alternative measures to characterise clusters in addition to economic ones – I’d like to see more of this, perhaps crowdfunding data or average broadband speeds and cost. Qualitative Data In addition to all this, Tech City UK also conducted a survey amongst business owners, which received 1797 submissions (you may remember filling it out ;), and interviewed 42 people active in the tech industry and cluster networks to gain additional insight into all of the clusters. To sum all this up, I should note that this is a very high-level overview of the way in which data has been used to compile the Tech Nation report (and I’ve probably done the researchers a disservice by making it seem much more simplistic than it is). Furthermore, despite the good transparency of practice contained in the methodology section of the report, and in the supporting research reports, it’s not completely clear what data was used to derive which specific figures in the report. It also leaves other questions unanswered, such as the extent to which the Nesta and GrowthIntel data corroborated each other. (Any further light shed on this by readers is very much appreciated – please comment or tweet me). This said, what is clear is that Tech Nation 2016 is by far the most comprehensive report of its kind ever produced, and absolutely deserves to be taken seriously. And while the anomalies the report has thrown up will be long debated (and I’m intending to look at some of them in a future post), they point the way to even more accurate data and reporting in future. Tech City UK tacitly acknowledge the shortcomings of current data and methods by including the requirement to develop these in the report’s five recommendations to support the growth of the industry. If the 2017 report can show as much improvement as this year’s does over 2015, then things are on an excellent track.