ESS Employer Skills Survey
EU European Union
FILCO Fuente de Información Laboral de Colombia
GDP Gross Domestic Product
GEIH Gran Encuesta Integrada de Hogares
HSEQ Health, Safety, Environment & Quality
HTML Hypertext Markup Language
IALS International Adult Literacy Survey
ICT Information and Communications Technology
IDB Interamerican Bank of Development
IER Warwick Institute for Employment Research
ILO International Labour Organization
IP Internet Protocol
ISCO International Standard Classification of Occupations
ISIC International Standard Industrial Classification of All Economic Activities
ISO International Organization for Standardization
IT Information Technology
LASSO Least Absolute Shrinkage and Selection Operator
LEFM Local Economy Forecasting Model
LFS Labour Force Survey
LTDA Limitada
MAC Migration Advisory Committee
MEN Ministerio de Educación Nacional de Colombia
N&E New and Emerging (Occupations)
NIF Normas de Información Financiera
NIIF Normas Internacionales de Información Financiera
NOS National Occupational Standards
NQF National Qualifications Framework
OECD Organisation for Economic Co-operation and Development
OEI Organización de Estados Iberoamericanos
OLS Ordinary Least Squares
O*NET Occupational Information Network
ONS Office for National Statistics
OSP Occupational Skills Profiles
OVATE Skills Online Vacancy Analysis Tool
PHP Hypertext Preprocessor
PIAAC Programme for the International Assessment of Adult Competencies
PISA Programme for International Student Assessment
RSPO Roundtable on Sustainable Palm Oil
RUES Registro único empresarial
SENA Servicio Nacional de Aprendizaje
SEO Search Engine Optimization
SIC Standard Industrial Classification
SMEs Small and Medium-Sized Enterprises
SMMLV Salario mínimo mensual legal vigente
SNIES Sistema Nacional de Información de Educación Superior
SNPP Sub-National Population Projections
SOC Standard Occupational Classification
SQA Software Quality Assurance/Advisor
SQL Structured Query Language
SST System Support Team
SSTA Gestión en seguridad, salud en el trabajo y ambiente
STEP Skills Measurement Program
SVM Support Vector Machine
TAT Store-to-store (for its acronym in Spanish)
TVET Technical and Vocational Education and Training
UAESPE Unidad Administrativa Especial del Servicio Público de Empleo
UK United Kingdom
UKCES UK Commission for Employment and Skills
US United States
VET Vocational Education and Training
XML Extensible Markup Language
1. Introduction
This book studies how, and to what extent, a web-based system to monitor skills and skill mismatches could be developed for Colombia based on information from job portals. More specifically, this document seeks to answer the following questions: 1) How can information from job portals be used to inform policy recommendations? And, in order to address two of the major labour market problems in Colombia, which are high unemployment and informality rates, 2) to what extent can information from job portals (unsatisfied demand) and national household surveys (labour supply) be used together to provide insights about skill mismatch issues in a developing economy?
Consequently, this book investigates the challenges, advantages, and limitations of collecting information from job portals and proposes a framework to test this information’s validity for economic analysis. It conducts an innovative labour market analysis and develops indicators based on updated and robust labour demand (job portal) and labour supply (household survey) information to tackle skill mismatches, extending thus the use of novel sources of information to yet unexplored areas in the existing labour economics literature.
By doing so, this study makes conceptual, methodological, and empirical contributions to the ongoing debate in economics about the use of information from job portals for labour demand analysis. The main conceptual contribution consists of demonstrating that the concept and sources of Big Data (in this case, job portal sources) can provide consistent results to orient public policies (see Chapters 7to 9). This document also demonstrates that, with the proper techniques, information from job portals can fulfil conceptual requirements to be considered as high-quality data for labour market analysis (see Chapters 4and 10).
The main methodological contribution is the development of a detailed framework and methods to collect, clean, and organise (i.e. web scraping, occupation and skill identification, etc.) vacancy data, which allows testing and analysing this source of information for consistent labour market insights. Specifically, this book contributes to the methodology of processing information from job portals for public policy advise by: 1) discussing different criteria (volume, website quality, and traffic ranking) to select the most relevant and trustworthy job portals in order to collect vacancy information ( Chapter 5); 2) providing a detailed explanation about Big Data techniques (web scraping) and the challenges they pose for automatically collecting job advertisements from job portals ( Chapter 5); 3) applying mixed-methods approaches (text mining, word-based matching methods, etc.) to standardise information collected from different job portals into a single database for statistical analysis ( Chapter 6); 4) implementing and extending a mixed-methods approach (stop words, stemming, extensions of a machine learning algorithm, etc.) in order to identify skills and occupations in online job announcements ( Chapter 6); 5) and, importantly, using this extended mixed-methods approach (e.g. a skills dictionary to identify skill patterns) to find new or specific skills and occupations in the Colombian labour market, which would otherwise be complex to identify via other means (e.g. household surveys) ( Chapter 6).
Moreover, the book proposes a (n-gram-based) method to reduce duplication issues (as information is collected from different job portals, some job advertisements can be repeated) and a (Lasso) method to impute missing values, such as education and wages ( Chapter 6). Consequently, by implementing and extending novel mixed methods, 6) this document improves data collection and helps to understand methodological changes to collect and organise information from job portals.
As a product of the above methods, a vacancy database was consolidated for the period between January 1, 2016 and December 31, 2018 ( Chapter 7). In addition, this document makes further methodological contributions by 7) proposing a framework to evaluate the internal (consistency) and external (representativeness) validity of this vacancy database. To test internal validity, a statistical comparison was conducted between variables, such as wages, occupations, education, etc., to understand biases, errors, and inconsistencies within the database. The evaluation of external validity was particularly challenging because countries like Colombia do not have vacancy censuses (or anything similar) to compare information collected from job portals. Despite several obstacles, this book provides and applies a methodology framework to evaluate the vacancy database. It implements a detailed comparison between official information available in the country (i.e. household surveys) and vacancy data results, such as vacancy, employment, new hires, unemployment, occupational structures and their dynamics over the study period. This comparison enables the understanding of possible biases (e.g. over/underrepresentation of certain occupational groups) in the vacancy database ( Chapter 8).
Читать дальше