A Computational Technique for Eliminating Low Complexity Regions in DNA Sequences

Olugbenga Ayomide Madamidola; Temitope J Adedeji; Ojoma Lauretta Osajiuba

Abstracto

A Computational Technique for Eliminating Low Complexity Regions in DNA Sequences

Olugbenga Ayomide Madamidola, Temitope J Adedeji, Ojoma Lauretta Osajiuba

Low complexity regions (LCRs) are short sequence repeats in DNA sequences. These repeats can cause high alignment scores with unrelated sequences in sequence databases and therefore need to be eliminated to enhance homology search. In this study, a computational approach was developed to eliminate low complexity regions in “real” DNA sequences, thereby classifying nucleotides into Significant Region (SR) and Non-Significant Region (NSR) which depicts the positive and negative classes respectively. Support Vector Machine was used for the classification task which employs training and testing datasets. The DNA sequences of Plasmodium falciparum were encoded using the positional information and density information techniques, which resulted into having two training and two testing datasets. The SVM model in this case was able to eliminate low complexity regions in “real” DNA sequences with a promising result.

Descargo de responsabilidad: este resumen se tradujo utilizando herramientas de inteligencia artificial y aún no ha sido revisado ni verificado

Revista de secuenciación y aplicaciones de próxima generación

Abstracto

A Computational Technique for Eliminating Low Complexity Regions in DNA Sequences

Puntos destacados de la revista

Revista indexada en