Title : ( Accelerating long-read overlap detection for genome assembly with a two-hash table strategy )
Authors: Mahdie Eghdami , Mahmoud Naghibzadeh , Hamid Noori ,Access to full-text not allowed by authors
Abstract
Genome assembly using long reads, produced by advanced sequencing technologies (e.g., PacBio, Nanopore), has gained widespread popularity due to their ability to span larger genomic regions. However, a crucial step in assembling these reads into a complete genome is detecting overlaps between them– a process that is timeconsuming and thus challenging. To address the challenge of slow runtime in detecting overlaps for long-read genome assembly, we introduce a novel method that significantly expedites overlap detection while maintaining accuracy. Although our method follows the traditional three-phase approach of hash table construction, candidate overlap detection, and candidate auditing, we incorporate several key innovations: 1) In addition to the traditional hash table, we construct a second hash table with a different k-mer size in an efficient way. This table is used to refine candidate detection and also overlap region estimation while the traditional first hash table is utilized to find anchors within the estimated overlap region to audit candidates and determine the exact candidate region. 2) Overlap candidates are efficiently and accurately identified using the second hash table. 3) A two-step strategy is employed to reduce computational overhead of candidate auditing. First, we estimate the overlap region, and then we audit candidate and confirm the overlap region using a dynamic programming and the first hash table. Comparative results demonstrate that the proposed overlap detector significantly improves both the assembly quality and speed by efficiently detecting overlapping reads, enabling faster and more comprehensive genome assemblies.
Keywords
, De novo genome assembly Long read overlap detection Two, hash table strategy@article{paperid:1106572,
author = {Eghdami, Mahdie and Naghibzadeh, Mahmoud and Noori, Hamid},
title = {Accelerating long-read overlap detection for genome assembly with a two-hash table strategy},
journal = {Computational Biology and Chemistry},
year = {2025},
volume = {119},
month = {December},
issn = {1476-9271},
pages = {108576--108588},
numpages = {12},
keywords = {De novo genome assembly
Long read overlap detection
Two-hash table strategy},
}
%0 Journal Article
%T Accelerating long-read overlap detection for genome assembly with a two-hash table strategy
%A Eghdami, Mahdie
%A Naghibzadeh, Mahmoud
%A Noori, Hamid
%J Computational Biology and Chemistry
%@ 1476-9271
%D 2025
دانلود فایل برای اعضای دانشگاه