To Örebro University

oru.seÖrebro University Publications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Biomarker discovery: classification using pooled samples
Leibniz Institute for Farm Animal Biology, Dummerstorf, Germany.
Leibniz Institute for Farm Animal Biology, Dummerstorf, Germany.ORCID iD: 0000-0002-7173-5579
Leibniz Institute for Farm Animal Biology, Dummerstorf, Germany.
2013 (English)In: Computational statistics (Zeitschrift), ISSN 0943-4062, E-ISSN 1613-9658, Vol. 28, no 1, p. 67-106Article in journal (Refereed) Published
Abstract [en]

RNA-sample pooling is sometimes inevitable, but should be avoided in classification tasks like biomarker studies. Our simulation framework investigates a two-class classification study based on gene expression profiles to point out howstrong the outcomes of single sample designs differ to those of pooling designs. The results show how the effects of pooling depend on pool size, discriminating pattern, number of informative features and the statistical learning method used (support vector machines with linear and radial kernel, random forest (RF), linear discriminant analysis, powered partial least squares discriminant analysis (PPLS-DA) and partial least squares discriminant analysis (PLS-DA)). As a measure for the pooling effect, we consider prediction error (PE) and the coincidence of important feature sets for classification based on PLS-DA, PPLS-DAand RF. In general, PPLS-DAand PLS-DAshow constant PE with increasing pool size and low PE for patterns for which the convex hull of one class is not a cover of the other class. The coincidence of important feature sets is larger for PLS-DA and PPLS-DA as it is for RF. RF shows the best results for patterns in which the convex hull of one class is a cover of the other class, but these depend strongly on the pool size. We complete the PE results with experimental data whichwe pool artificially. The PE of PPLS-DAand PLS-DAare again least influenced by pooling and are low. Additionally, we show under which assumption the PLS-DA loading weights, as a measure for importance of features regarding classification, are equal for the different designs.

Place, publisher, year, edition, pages
Heidelberg, Germany: Springer, 2013. Vol. 28, no 1, p. 67-106
Keywords [en]
Sample pooling, biomarker search, statistical learning methods, partial least squares discriminant analysis, prediction error
National Category
Bioinformatics (Computational Biology)
Identifiers
URN: urn:nbn:se:oru:diva-40740DOI: 10.1007/s00180-011-0302-0ISI: 000315163600006Scopus ID: 2-s2.0-84874221732OAI: oai:DiVA.org:oru-40740DiVA, id: diva2:778578
Available from: 2015-01-11 Created: 2015-01-11 Last updated: 2018-01-30Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Authority records

Repsilber, Dirk

Search in DiVA

By author/editor
Repsilber, Dirk
In the same journal
Computational statistics (Zeitschrift)
Bioinformatics (Computational Biology)

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 388 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf