countfitteR: count data analysis for precision medicine

Jarosław Chilimoniuk

Count data, one of the most common data types in many fields, is by default assumed to follow the Poisson distribution. This assumption, however, may lead to biased results and faulty conclusions in data bodies with excess zero values (zero-inflation), a variance larger than the mean (overdispersion), or both. In such cases, the standard assumption of a Poisson distribution would skew the estimation of mean and variance, and other models like the negative binomial (NB), zero-inflated Poisson or zero-inflated NB distributions should be employed. The selection of the most appropriate distribution model, however, is not trivial. To support and simplify this process for experimental researchers, in particular, we have implemented the countfitteR software. We describe the performance of this software based on real-life examples of count data in precision medicine: DNA double-strand breaks are a highly specific and sensitive molecular biomarker for monitoring DNA damage in cancer, aging research and the evaluation of drug efficacy, and are detected and quantified by foci formation in fluorescence microscopy. In analyzing a large number of data sets from such experiments, countfitteR demonstrated an equal or superior statistical performance compared to the usually employed two-step procedure, with overall power of up to 98%. In addition, it still gave information in cases with no result at all from the two-step procedure. Originally designed for the analysis of foci in biomedical image data, countfitteR can be used in a variety of areas where non-Poisson distributed counting data is prevalent. our software is available as an R package and a webserver: http://biongram.biotech.uni.wroc.pl/countfitteR/.