Charla IMFD: "Features and metrics for data formats"

Dominik Tomaszuk, Universidad de Bialystok (Polonia).
3 Octubre, 2019 - 12:00
Sala Philippe Flajolet (3er piso edificio poniente)
Instituto Milenio Fundamentos de los Datos

Abstract: In information technology, a data format defines a set of syntax rules to encode data. Nowadays there are several data formats to encode text, images, video and other types of data. It is usual that a data format can support different types of data, and the same data can be encoded in different data formats. This many-to-many relationship generates several questions: What is the best data format? Are two data formats comparable? What kind of data (or data model) a data format is able to support? All these questions are related to the features of a data format. In the documentation about data formats, we can find statements such as: lightweight format, concise format, and human-readable format. Unfortunately, the above adjectives are not really useful as there is no standard meaning for them. In this talk we propose a set of features for a data format (e.g. Flexibility), providing a clear definition and evaluation metrics. Additionally, we use the metrics to compare general data formats (e.g. XML and CSV) and application-oriented formats (e.g. GraphML and GraphSON).


Bio: Dr Dominik Tomaszuk is researcher at the University of Bialystok, Faculty of Mathematics and Informatics (Institute of Informatics), Poland. Dominik holds an M.Sc. (2008) in Computer Science, from the Bialystok University of Technology, Poland. He also holds a Ph.D. (2014) in Computer Science from the Warsaw University of Technology, Poland. His current research focuses on Semantic Web, RDF, Property Graphs, NoSQL databases and cheminformatics.



Comunicaciones DCC