To Örebro University

oru.seÖrebro University Publications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Probabilistic Approach to Data Editing: Contributions to Editing in Survey Sampling
Örebro University, Örebro University School of Business.
2025 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

The efficiency and quality of data editing processes are challenges for National Statistical Institutes (NSIs) in producing reliable official statistics. The traditional approach to data editing, heavily reliant on manual interventions, is resource-intensive and may introduce biases, impacting the overall accuracy of statistical estimates. This thesis aims to address these challenges by developing an in-novative editing framework based on probabilistic theory, allowing for a more resource-efficient editing process while providing accurate estimates of data quality. Furthermore, the thesis proposes an estimation procedure that accounts for various error sources, offering unbiased estimates of population parameters with appropri-ate measures of accuracy.

In addition to the introductory part, the thesis is structured around four key papers, each contributing to the overall objective of improving data editing and estimation processes in official statistics. Paper I presents a combined selective and probabilistic editing approach that maintains data quality while reducing resource demands. Paper II explores the integration of probabilistic editing with generalized regression (GREG) estimation, demonstrating improved accuracy in population parameter estimation. Paper III extends the framework to address nonresponse errors alongside measurement errors, using a three-phase sampling setup. Paper IV investigates the impact of various score functions in the probabilis-tic editing framework, emphasizing the importance of selecting effective score functions to minimize variance and improve estimate accuracy. Each paper contains, in addition to a theoretical part, an empirical section where concepts are numerically illustrated based on either real data or synthetic data.

Place, publisher, year, edition, pages
Örebro: Örebro University , 2025. , p. 27
Series
Örebro Studies in Statistics, ISSN 1651-8608 ; 10
Keywords [en]
data editing, selective editing, measurement error, survey statistics
National Category
Probability Theory and Statistics
Identifiers
URN: urn:nbn:se:oru:diva-120183ISBN: 9789175296395 (print)ISBN: 9789175296401 (electronic)OAI: oai:DiVA.org:oru-120183DiVA, id: diva2:1946896
Public defence
2025-04-15, Örebro universitet, Långhuset, Hörsal L3, Fakultetsgatan 1, Örebro, 13:15 (English)
Opponent
Supervisors
Available from: 2025-03-24 Created: 2025-03-24 Last updated: 2025-04-09Bibliographically approved
List of papers
1. Probability-sampling approach to editing
Open this publication in new window or tab >>Probability-sampling approach to editing
2009 (English)In: Austrian journal of statistics, ISSN 1026-597X, Vol. 38, no 3, p. 171-182Article in journal (Refereed) Published
Abstract [en]

Editing for measurement errors is always part of data processing. In traditional editing, all data records are checked for errors and inconsistencies. In a new way of editing, only the subset with the most important erroneous responses is considered for editing. This approach is applied in selective editing procedures, which have been shown to save resources considerably. However, selective editing lacks a probabilistic basis and the properties of estimators cannot be established using standard methods. In particular, bias properties of the estimator are unknown except for level estimates based on historical data. This paper proposes combining selective editing with an editing procedure based on the traditional probability-sampling framework. The variance of a bias-corrected Horvitz-Thompson estimator is derived and a variance estimator is proposed. The results of a simulation study support the use of the combined editing procedure.

Place, publisher, year, edition, pages
Österreichische Statistische Gesellschaft, 2009
National Category
Probability Theory and Statistics
Research subject
Statistics
Identifiers
urn:nbn:se:oru:diva-14116 (URN)10.17713/ajs.v38i3.270 (DOI)
Available from: 2011-01-20 Created: 2011-01-20 Last updated: 2025-04-03Bibliographically approved
2. GREG estimation and probabilistic editing
Open this publication in new window or tab >>GREG estimation and probabilistic editing
2012 (English)In: Metron, ISSN 0026-1424, Vol. 70, no 2-3, p. 133-144Article in journal (Refereed) Published
Abstract [en]

The purpose of editing is to correct erroneous entries in the dataset and assure the quality of data. It takes a lot of resources to correct all errors, so editing procedures where only a subset of errors to be corrected are sought after. Correcting only a subset of all errors will influence the final estimates, and tools evaluating the properties of the estimates like bias and variance need to be available. This paper introduces a probabilistic editing procedure where the responses are selected for editing through Poisson Mixture (PoMix) sampling and a bias adjusted GREG estimator is used for estimation. An expression for the variance of the bias adjusted GREG estimator is derived, and variance estimator is proposed. The effectiveness of the proposed editing procedure and the GREG estimator is illustrated using empirical data from Statistics Sweden.

Place, publisher, year, edition, pages
Springer, 2012
Keywords
GREG estimator, Editing, Two-phase sampling design, Bias estimation
National Category
Probability Theory and Statistics
Identifiers
urn:nbn:se:oru:diva-120397 (URN)10.1007/BF03321971 (DOI)000211686100003 ()
Available from: 2025-04-03 Created: 2025-04-03 Last updated: 2025-04-03Bibliographically approved
3. Estimation with probability edited survey data under nonresponse
Open this publication in new window or tab >>Estimation with probability edited survey data under nonresponse
2025 (English)Report (Other academic)
Abstract [en]

Probabilistic editing has been introduced to enable valid inference using established survey sampling theory in situations when some of the collected data points may have measurement errors and are therefore submitted to an editing process. To reduce the editing effort anavoid over-editing, in current practice selective editing is most often used, which is a form of editing that limits the edit checks to those potential errors that, if indeed in error, are likely to have the biggest impact on estimates to be produced. However, selective editing is not grounded in probability theory associated with survey sampling, and cannot provide expressions for point and variance estimates that account for the uncertainties introduced by selective editing.

In the spirit of the total survey error paradigm, this paper extends the previous work on probabilistic editing by proposing an estimation procedure that provides valid inference when two kinds of nonsampling error are simultaneously present, in addition to the sampling error: the measurement error, requiring an editing step, and the practically unavoidable nonresponse error which also needs to be taken into account when producing unbiased estimates.

In a three-phase selection setup, bias due to measurement error is estimated through probabilistic editing while weight adjustment employing auxiliary information is used to deal with nonresponse. An estimator based on calibration for nonresponse and corrected for bias due to measurement error is introduced. Its theoretical variance and an estimator of the variance are derived. A simulation study illustrates the three-phase selection setup and the practical performance of the derived point and variance estimators.

Place, publisher, year, edition, pages
Örebro: Örebro University School of Business, 2025. p. 43
Series
Working Papers, School of Business, ISSN 1403-0586 ; 3/2025
Keywords
nonsampling errors, probabilistic editing, selective editing, calibration estimator, measurement bias estimation
National Category
Probability Theory and Statistics
Identifiers
urn:nbn:se:oru:diva-120398 (URN)
Available from: 2025-04-03 Created: 2025-04-03 Last updated: 2025-04-03Bibliographically approved
4. An Exploration of Score Functions for Probability Editing
Open this publication in new window or tab >>An Exploration of Score Functions for Probability Editing
(English)Manuscript (preprint) (Other academic)
National Category
Probability Theory and Statistics
Identifiers
urn:nbn:se:oru:diva-120399 (URN)
Available from: 2025-04-03 Created: 2025-04-03 Last updated: 2025-04-03Bibliographically approved

Open Access in DiVA

Cover(253 kB)43 downloads
File information
File name COVER01.pdfFile size 253 kBChecksum SHA-512
97ca51e1b95379fcff6052a4d2ede98008a914789a9ada450d02cc74f2c7c91c5bc6ca2c253a309ec5be78c2d1e337bab43b6aed7ecbe13b1ff50a52c5e8dd37
Type coverMimetype application/pdf
Spikblad(161 kB)33 downloads
File information
File name SPIKBLAD01.pdfFile size 161 kBChecksum SHA-512
85efc8e97a94d76b78ea0d571e76f62148dad51254563594407c131dac5851fb6dee8ec006ebb58f30fd777ebc42cef2b5f5dc322a02c35b4dd6c0a437931587
Type spikbladMimetype application/pdf

Authority records

Ilves, Maiki

Search in DiVA

By author/editor
Ilves, Maiki
By organisation
Örebro University School of Business
Probability Theory and Statistics

Search outside of DiVA

GoogleGoogle Scholar
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 828 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf