Semantic Harmonisation of Numeric Data from Open Government Data

pier.buttigieg [ at ]


Open tabular data published as part of the open government initiatives typically contain a spatial dimension, a temporal dimension and the actual numeric data capturing information such as health indicators, pollution readings, sanitation status etc. "Semantic Harmonisation" of numeric data entails linking numeric data columns with web-accessible semantic entities from an ontology - a machine readable knowledge representation. These semantic entities are embedded in a knowledge graph, allowing integration of information from disparate sources under common semantic definitions across spatial and temporal dimensions. Multiple research efforts have contributed to recovering semantics of numeric columns in tables, however they are either restricted to a single domain or rely on the existence of numeric data as linked data tuples in known ontologies. We present a novel yet simple approach using a supervised machine learning classifier (Random Forests) and semantic web techniques to generate semantics for numeric columns in tabular data. This approach has been tested with encouraging results for over 100 tabular datasets from (Indian Open Government Data Portal) downloaded from multiple domains such as "Health and Family Welfare", "Agriculture", "Environment" etc. We also present a use case for this work, being implemented in collaboration with the ministries of the Government of Karnataka for knowledge aggregation and dissemination of sustainable development data.

Item Type
Conference (Talk)
Primary Division
Primary Topic
Publication Status
Event Details
CoDS-COMAD '19: 6th ACM IKDD CoDS and 24th COMAD, 01 Jan 2019 - 01 Jan 2019, Kolkata, India.
Eprint ID
DOI 10.1145/3297001.3297032

Cite as
Subramanian, A. , RR, P. K. , Vikkurthi, M. and Buttigieg, P. L. (2019): Semantic Harmonisation of Numeric Data from Open Government Data , CoDS-COMAD '19: 6th ACM IKDD CoDS and 24th COMAD, Kolkata, India, January 2019 - January 2019 . doi: 10.1145/3297001.3297032

Add to AnyAdd to TwitterAdd to FacebookAdd to LinkedinAdd to PinterestAdd to Email


Geographical region

Research Platforms


Edit Item Edit Item