Predictive spreadsheet autocompletion with constraints
2020 (English)In: Machine Learning, ISSN 0885-6125, E-ISSN 1573-0565, Vol. 109, no 2, p. 307-325Article in journal (Refereed) Published
Abstract [en]
Spreadsheets are arguably the most accessible data-analysis tool and are used by millions of people. Despite the fact that they lie at the core of most business practices, working with spreadsheets can be error prone, usage of formulas requires training and, crucially, spreadsheet users do not have access to state-of-the-art analysis techniques offered by machine learning. To tackle these issues, we introduce the novel task of predictive spreadsheet autocompletion, where the goal is to automatically predict the missing entries in the spreadsheets. This task is highly non-trivial: cells can hold heterogeneous data types and there might be unobserved relationships between their values, such as constraints or probabilistic dependencies. Critically, the exact prediction task itself is not given. We consider a simplified, yet non-trivial, setting and propose a principled probabilistic model to solve it. Our approach combines black-box predictive models specialized for different predictive tasks (e.g., classification, regression) and constraints and formulas detected by a constraint learner, and produces a maximally likely prediction for all target cells that is consistent with the constraints. Overall, our approach brings us one step closer to allowing end users to leverage machine learning in their workflows without writing a single line of code.
Place, publisher, year, edition, pages
Springer-Verlag New York, 2020. Vol. 109, no 2, p. 307-325
Keywords [en]
Spreadsheets Autocompletion, Bayesian Networks, Constraint Learning, Machine Learning
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:oru:diva-83315DOI: 10.1007/s10994-019-05841-yISI: 000492576000001Scopus ID: 2-s2.0-85074591152OAI: oai:DiVA.org:oru-83315DiVA, id: diva2:1442447
Note
Funding Agency:
European Research Council (ERC) 694980
2020-06-172020-06-172020-11-19Bibliographically approved