Close printable page

Recommendation

ONT long-read sequencing and Illumina short-read sequencing of 16S rDNA amplicons give comparable results in terms of bacterial community structure in marine sediments

Aymé Spor based on reviews by 2 anonymous reviewers

A recommendation of:

Fine-scale congruence in bacterial community structure from marine sediments sequenced by short-reads on Illumina and long-reads on Nanopore

Alice Lemoinne, Guillaume Dirberg, Myriam Georges, Tony Robinet (2023), bioRxiv, ver.3, peer-reviewed and recommended by PCI Microbiology https://doi.org/10.1101/2023.06.06.541006

Read preprint in preprint server

Data used for results

Scripts used to obtain or analyze results

Abstract

EN

AR

ES

FR

HI

JA

PT

RU

ZH-CN

Fine-scale congruence in bacterial community structure from marine sediments sequenced by short-reads on Illumina and long-reads on Nanopore

Following the development of high-throughput sequencers, environmental prokaryotic communities are usually described by metabarcoding with genetic markers on the 16S domain. However, short-read sequencing encounters a limitation in phylogenetic coverage and taxonomic resolution, due to the primers choice and read length. On these critical points, nanopore sequencing, a rising technology, suitable for long-read metabarcoding, was much undervalued because of its relatively higher error rate per read. Here we compared the prokaryotic community structure in a mock community and 52 sediment samples from two contrasted mangrove sites, described by short-reads on 16SV4-V5 marker (ca. 0.4kpb) analyzed by Illumina sequencing (MiSeq, V3), with those described by long-reads on bacterial nearly complete 16S (ca. 1.5 kpb) analyzed by Oxford Nanopore (MinION, R9.2). Short- and long-reads retrieved all the bacterial genera from the mock, although both showing similar deviations from the awaited proportions. From the sediment samples, with a coverage-based rarefaction of reads and after singletons filtering, co-inertia and Procrustean tests showed that bacterial community structures inferred from short- and long-reads were significantly similar, showing both a comparable contrast between sites and a coherent sea-land orientation within sites. In our dataset, 84.7 and 98.8% of the short-reads were assigned strictly to the same species and genus, respectively, than those detected by long-reads. Primer specificities of long-16S allowed it to detect 92.2% of the 309 families and 87.7% of the 448 genera that were detected by the short 16SV4-V5. Long-reads recorded 973 additional taxa not detected by short-reads, among which 91.7% were identified to the genus rank, some belonging to 11 exclusive phyla, albeit accounting for only 0.2% of total long-reads.

microbial metabarcoding, environmental DNA, methods, primers, diversity

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

تطابق دقيق النطاق في بنية المجتمع البكتيري من الرواسب البحرية متسلسلة بقراءات قصيرة على Illumina وقراءات طويلة على Nanopore

بعد تطوير أجهزة التسلسل عالية الإنتاجية، عادة ما يتم وصف مجتمعات بدائيات النواة البيئية عن طريق الترميز الاستقلابي باستخدام العلامات الجينية في مجال 16S. ومع ذلك، فإن تسلسل القراءة القصيرة يواجه قيودًا في التغطية التطورية والقرار التصنيفي، وذلك بسبب اختيار الاشعال وطول القراءة. فيما يتعلق بهذه النقاط الحرجة، فإن تقنية تسلسل ثقب النانو، وهي تقنية صاعدة ومناسبة للتشفير الفوقي للقراءة الطويلة، كانت مقومة بأقل من قيمتها بكثير بسبب معدل الخطأ المرتفع نسبيًا لكل قراءة. قمنا هنا بمقارنة بنية مجتمع بدائيات النواة في مجتمع وهمي و52 عينة من الرواسب من موقعين متباينين من أشجار المنغروف، الموصوفين بقراءات قصيرة على علامة 16SV4-V5 (حوالي 0.4 كيلو بايت في البليون) تم تحليلها بواسطة تسلسل Illumina (MiSeq، V3)، مع تلك الموصوفة من خلال قراءات طويلة على 16S بكتيرية كاملة تقريبًا (حوالي 1.5 كيلو بايت في البوصة) تم تحليلها بواسطة Oxford Nanopore (MinION، R9.2). استرجعت القراءات القصيرة والطويلة جميع الأجناس البكتيرية من الصورة الوهمية، على الرغم من أن كلاهما أظهر انحرافات مماثلة عن النسب المنتظرة. من عينات الرواسب، مع تخلخل القراءات القائم على التغطية وبعد ترشيح المفردات، أظهرت اختبارات القصور الذاتي المشترك واختبارات البروكرست أن هياكل المجتمع البكتيري المستنتجة من القراءات القصيرة والطويلة كانت متشابهة إلى حد كبير، مما يدل على تباين مماثل بين المواقع و التوجه المتماسك للبحر والأرض داخل المواقع. في مجموعة البيانات الخاصة بنا، تم تخصيص 84.7 و98.8% من القراءات القصيرة بشكل صارم لنفس النوع والجنس، على التوالي، مقارنة بتلك التي اكتشفتها القراءات الطويلة. سمحت الخصائص التمهيدية لـ long-16S باكتشاف 92.2% من 309 عائلات و87.7% من 448 جنسًا تم اكتشافها بواسطة 16SV4-V5 القصير. سجلت القراءات الطويلة 973 تصنيفًا إضافيًا لم يتم اكتشافها بواسطة القراءات القصيرة، من بينها 91.7% تم تحديدها في رتبة الجنس، وبعضها ينتمي إلى 11 شعبة حصرية، وإن كانت تمثل 0.2% فقط من إجمالي القراءات الطويلة.

الترميز الميكروبي، الحمض النووي البيئي، الطرق، الاشعال، التنوع

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

Congruencia a escala fina en la estructura de la comunidad bacteriana de sedimentos marinos secuenciada mediante lecturas cortas en Illumina y lecturas largas en Nanopore

Tras el desarrollo de secuenciadores de alto rendimiento, las comunidades procarióticas ambientales generalmente se describen mediante metabarcodificaciones con marcadores genéticos en el dominio 16S. Sin embargo, la secuenciación de lectura corta encuentra una limitación en la cobertura filogenética y la resolución taxonómica, debido a la elección de los cebadores y la longitud de la lectura. En estos puntos críticos, la secuenciación de nanoporos, una tecnología en ascenso, adecuada para metacódigos de barras de lectura larga, fue muy infravalorada debido a su tasa de error por lectura relativamente mayor. Aquí comparamos la estructura de la comunidad procariótica en una comunidad simulada y 52 muestras de sedimentos de dos sitios de manglares contrastados, descritas mediante lecturas cortas en el marcador 16SV4-V5 (ca. 0,4 kpb) analizadas mediante secuenciación de Illumina (MiSeq, V3), con las descritas mediante lecturas largas en 16S bacteriano casi completo (aproximadamente 1,5 kpb) analizado por Oxford Nanopore (MinION, R9.2). Las lecturas cortas y largas recuperaron todos los géneros bacterianos del simulacro, aunque ambas mostraron desviaciones similares de las proporciones esperadas. A partir de las muestras de sedimento, con una rarefacción de lecturas basada en la cobertura y después del filtrado de singletons, las pruebas de coinercia y Procusto mostraron que las estructuras de la comunidad bacteriana inferidas de lecturas cortas y largas eran significativamente similares, mostrando un contraste comparable entre sitios y una orientación coherente mar-tierra dentro de los sitios. En nuestro conjunto de datos, el 84,7 y el 98,8% de las lecturas cortas se asignaron estrictamente a la misma especie y género, respectivamente, que las detectadas por las lecturas largas. Las especificidades del cebador del 16S largo le permitieron detectar el 92,2% de las 309 familias y el 87,7% de los 448 géneros detectados por el 16SV4-V5 corto. Las lecturas largas registraron 973 taxones adicionales no detectados por las lecturas cortas, entre los cuales el 91,7% fueron identificados en el rango de género, algunos pertenecientes a 11 filos exclusivos, aunque representan solo el 0,2% del total de lecturas largas.

metabarcodificación microbiana, ADN ambiental, métodos, cebadores, diversidad

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

Congruence à petite échelle dans la structure de la communauté bactérienne des sédiments marins séquencée par des lectures courtes sur Illumina et des lectures longues sur Nanopore

Suite au développement de séquenceurs à haut débit, les communautés procaryotes environnementales sont généralement décrites par métabarcoding avec des marqueurs génétiques sur le domaine 16S. Cependant, le séquençage à lecture courte se heurte à une limitation en termes de couverture phylogénétique et de résolution taxonomique, en raison du choix des amorces et de la longueur de lecture. Sur ces points critiques, le séquençage des nanopores, une technologie émergente, adaptée au métabarcoding à lecture longue, était très sous-évalué en raison de son taux d’erreur par lecture relativement plus élevé. Ici, nous avons comparé la structure de la communauté procaryote dans une communauté simulée et 52 échantillons de sédiments provenant de deux sites de mangrove contrastés, décrits par des lectures courtes sur le marqueur 16SV4-V5 (environ 0,4 kpb) analysés par séquençage Illumina (MiSeq, V3), avec ceux décrits. par des lectures longues sur du 16S bactérien presque complet (environ 1,5 kpb) analysé par Oxford Nanopore (MinION, R9.2). Les lectures courtes et longues ont récupéré tous les genres bactériens de la simulation, bien que les deux montrent des écarts similaires par rapport aux proportions attendues. À partir des échantillons de sédiments, avec une raréfaction des lectures basée sur la couverture et après filtrage des singletons, les tests de co-inertie et de Procruste ont montré que les structures de la communauté bactérienne déduites des lectures courtes et longues étaient significativement similaires, montrant à la fois un contraste comparable entre les sites et un une orientation mer-terre cohérente au sein des sites. Dans notre ensemble de données, 84,7 et 98,8 % des lectures courtes ont été attribuées strictement aux mêmes espèces et genres, respectivement, que celles détectées par les lectures longues. Les spécificités des amorces du 16S long lui ont permis de détecter 92,2 % des 309 familles et 87,7 % des 448 genres détectés par le 16SV4-V5 court. Les lectures longues ont enregistré 973 taxons supplémentaires non détectés par les lectures courtes, parmi lesquels 91,7 % ont été identifiés au rang du genre, certains appartenant à 11 phylums exclusifs, bien qu'ils ne représentent que 0,2 % du total des lectures longues.

métabarcoding microbien, ADN environnemental, méthodes, amorces, diversité

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

इलुमिना पर लघु-पाठ और नैनोपोर पर दीर्घ-पाठ द्वारा अनुक्रमित समुद्री तलछट से जीवाणु समुदाय संरचना में सूक्ष्म पैमाने पर अनुरूपता

हाई-थ्रूपुट सीक्वेंसर के विकास के बाद, पर्यावरणीय प्रोकैरियोटिक समुदायों को आमतौर पर 16S डोमेन पर आनुवंशिक मार्करों के साथ मेटाबार्कोडिंग द्वारा वर्णित किया जाता है। हालाँकि, प्राइमर की पसंद और पढ़ने की लंबाई के कारण शॉर्ट-रीड सीक्वेंसिंग को फ़ाइलोजेनेटिक कवरेज और टैक्सोनोमिक रिज़ॉल्यूशन में एक सीमा का सामना करना पड़ता है। इन महत्वपूर्ण बिंदुओं पर, नैनोपोर अनुक्रमण, एक उभरती हुई तकनीक, जो लंबे समय से पढ़ी जाने वाली मेटाबार्कोडिंग के लिए उपयुक्त है, प्रति रीड अपेक्षाकृत उच्च त्रुटि दर के कारण इसका बहुत कम मूल्यांकन किया गया था। यहां हमने एक मॉक समुदाय में प्रोकैरियोटिक समुदाय संरचना और दो विपरीत मैंग्रोव साइटों से 52 तलछट नमूनों की तुलना की, जिनका वर्णन इलुमिना अनुक्रमण (एमआईसेक, वी3) द्वारा विश्लेषण किए गए 16एसवी4-वी5 मार्कर (सीए. 0.4केपीबी) पर संक्षिप्त-रीड द्वारा किया गया है। ऑक्सफ़ोर्ड नैनोपोर (मिनियन, आर9.2) द्वारा विश्लेषण किए गए लगभग पूर्ण 16एस (सीए 1.5 केपीबी) बैक्टीरिया पर लंबे समय तक पढ़ने से। लघु- और दीर्घ-पठन ने मॉक से सभी जीवाणु जनन को पुनः प्राप्त कर लिया, हालांकि दोनों प्रतीक्षित अनुपात से समान विचलन दिखा रहे थे। तलछट के नमूनों से, रीडिंग के कवरेज-आधारित रेयरफैक्शन के साथ और सिंगलटन फ़िल्टरिंग के बाद, सह-जड़ता और प्रोक्रस्टियन परीक्षणों से पता चला कि छोटी और लंबी-रीड से अनुमानित जीवाणु समुदाय संरचनाएं काफी समान थीं, जो साइटों और ए के बीच एक तुलनीय अंतर दिखाती हैं। साइटों के भीतर सुसंगत समुद्री-भूमि अभिविन्यास। हमारे डेटासेट में, 84.7 और 98.8% शॉर्ट-रीड को क्रमशः एक ही प्रजाति और जीनस को सौंपा गया था, जो कि लॉन्ग-रीड द्वारा पता लगाए गए थे। लॉन्ग-16एस की प्राइमर विशिष्टताओं ने इसे 309 परिवारों में से 92.2% और 448 जेनेरा में से 87.7% का पता लगाने की अनुमति दी, जिन्हें शॉर्ट 16एसवी4-वी5 द्वारा पता लगाया गया था। लॉन्ग-रीड्स में 973 अतिरिक्त टैक्सा दर्ज किए गए, जिन्हें शॉर्ट-रीड्स द्वारा नहीं पहचाना गया, जिनमें से 91.7% को जीनस रैंक में पहचाना गया, कुछ 11 विशिष्ट फ़ाइला से संबंधित थे, हालांकि कुल लॉन्ग-रीड्स का केवल 0.2% था।

माइक्रोबियल मेटाबार्कोडिंग, पर्यावरण डीएनए, विधियाँ, प्राइमर, विविधता

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

Illumina のショートリードと Nanopore のロングリードによってシーケンスされた海洋堆積物の細菌群集構造の微細スケールの一致

ハイスループットシークエンサーの開発後、環境原核生物群集は通常、16S ドメイン上の遺伝マーカーによるメタバーコーディングによって記述されます。ただし、ショートリードシーケンシングでは、プライマーの選択とリード長により、系統発生の範囲と分類学的解像度に制限があります。これらの重要な点において、ロングリードメタバーコーディングに適した新興技術であるナノポアシーケンシングは、リードあたりのエラー率が比較的高いため、非常に過小評価されていました。ここでは、模擬群集の原核生物群集構造と、イルミナシーケンス（MiSeq、V3）によって分析された16SV4-V5マーカー（約0.4kpb）のショートリードによって記述された2つの対照的なマングローブサイトからの52の堆積物サンプルと、記載されているものとを比較しました。 Oxford Nanopore (MinION、R9.2) によって分析された細菌のほぼ完全な 16S (約 1.5 kpb) のロングリードによる。ショートリードとロングリードではモックからすべての細菌属が取得されましたが、両方とも期待されていた割合からの同様の逸脱を示しました。堆積物サンプルから、カバレッジに基づいたリードの希薄化とシングルトンフィルタリング後の共慣性テストとプロクルスタンテストにより、ショートリードとロングリードから推測される細菌群集構造が著しく類似していることが示され、サイト間で同等のコントラストと、敷地内の一貫した海と陸の向き。私たちのデータセットでは、ショートリードの 84.7% と 98.8% が、ロングリードで検出されたものと比べて、それぞれ厳密に同じ種と属に割り当てられました。ロング 16S のプライマー特異性により、ショート 16SV4-V5 によって検出された 309 科の 92.2% と 448 属の 87.7% を検出できました。ロングリードでは、ショートリードでは検出されなかった追加の 973 分類群が記録され、そのうち 91.7% が属ランクに同定され、一部は 11 の排他的な門に属していましたが、ロングリード全体のわずか 0.2% を占めていました。

微生物メタバーコーディング、環境 DNA、メソッド、プライマー、多様性

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

Congruência em escala fina na estrutura da comunidade bacteriana de sedimentos marinhos sequenciados por leituras curtas no Illumina e leituras longas no Nanopore

Após o desenvolvimento de sequenciadores de alto rendimento, as comunidades procarióticas ambientais são geralmente descritas por metabarcoding com marcadores genéticos no domínio 16S. No entanto, o sequenciamento de leitura curta encontra uma limitação na cobertura filogenética e na resolução taxonômica, devido à escolha dos primers e ao comprimento da leitura. Nestes pontos críticos, o sequenciamento de nanoporos, uma tecnologia emergente, adequada para metabarcoding de leitura longa, foi muito subvalorizado devido à sua taxa de erro relativamente mais alta por leitura. Aqui comparamos a estrutura da comunidade procariótica em uma comunidade simulada e 52 amostras de sedimentos de dois locais de manguezais contrastantes, descritas por leituras curtas no marcador 16SV4-V5 (ca. 0,4kpb) analisadas por sequenciamento Illumina (MiSeq, V3), com aquelas descritas por leituras longas em 16S bacteriano quase completo (ca. 1,5 kpb) analisado por Oxford Nanopore (MinION, R9.2). Leituras curtas e longas recuperaram todos os gêneros bacterianos da simulação, embora ambas apresentassem desvios semelhantes das proporções esperadas. A partir das amostras de sedimentos, com uma rarefação de leituras baseada em cobertura e após filtragem de singletons, os testes de co-inércia e Procusto mostraram que as estruturas da comunidade bacteriana inferidas a partir de leituras curtas e longas eram significativamente semelhantes, mostrando um contraste comparável entre locais e um orientação coerente entre mar e terra dentro dos locais. Em nosso conjunto de dados, 84,7 e 98,8% das leituras curtas foram atribuídas estritamente à mesma espécie e gênero, respectivamente, daquelas detectadas pelas leituras longas. As especificidades do primer do 16S longo permitiram detectar 92,2% das 309 famílias e 87,7% dos 448 gêneros detectados pelo 16SV4-V5 curto. As leituras longas registraram 973 táxons adicionais não detectados pelas leituras curtas, entre os quais 91,7% foram identificados na classificação de gênero, alguns pertencentes a 11 filos exclusivos, embora representando apenas 0,2% do total de leituras longas.

metabarcoding microbiano, DNA ambiental, métodos, primers, diversidade

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

Мелкомасштабное соответствие в структуре бактериального сообщества из морских отложений, секвенированное короткими чтениями на Illumina и длинными чтениями на Nanopore.

После разработки высокопроизводительных секвенаторов прокариотические сообщества окружающей среды обычно описываются с помощью метабаркодирования с использованием генетических маркеров в домене 16S. Однако секвенирование с коротким считыванием сталкивается с ограничениями в филогенетическом охвате и таксономическом разрешении из-за выбора праймеров и длины считывания. В этих критических моментах секвенирование нанопор, развивающаяся технология, подходящая для метабаркодирования с длительным чтением, была сильно недооценена из-за относительно более высокого уровня ошибок на одно чтение. Здесь мы сравнили структуру прокариотического сообщества в ложном сообществе и 52 образца отложений из двух контрастирующих участков мангровых зарослей, описанных короткими чтениями маркера 16SV4-V5 (около 0,4 кПа), проанализированных с помощью секвенирования Illumina (MiSeq, V3), с описанными путем длительного чтения бактериального почти полного 16S (около 1,5 кпб), проанализированного с помощью Oxford Nanopore (MinION, R9.2). При коротком и длинном чтении из макета были извлечены все бактериальные роды, хотя оба показали одинаковые отклонения от ожидаемых пропорций. Из образцов отложений с разрежением прочтений на основе охвата и после фильтрации одиночных элементов тесты коинерции и Прокруста показали, что структуры бактериальных сообществ, выведенные из коротких и длинных чтений, были значительно схожи, демонстрируя как сопоставимый контраст между участками, так и последовательная ориентация моря и суши на объектах. В нашем наборе данных 84,7 и 98,8% коротких прочтений были отнесены строго к тому же виду и роду соответственно, что и те, которые были обнаружены с помощью лонгридов. Специфичность праймера длинного 16S позволила ему обнаружить 92,2% из 309 семейств и 87,7% из 448 родов, которые были обнаружены с помощью короткого 16SV4-V5. При лонгридах было зарегистрировано 973 дополнительных таксона, не обнаруженных при коротком чтении, из которых 91,7% были идентифицированы до ранга рода, некоторые принадлежали к 11 исключительным типам, хотя на их долю приходилось лишь 0,2% от общего числа лонгридов.

микробное метабаркодирование, ДНК окружающей среды, методы, праймеры, разнообразие

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

通过 Illumina 上的短读长和 Nanopore 上的长读长测序的海洋沉积物中细菌群落结构的精细一致性

随着高通量测序仪的发展，环境原核生物群落通常通过 16S 结构域上的遗传标记进行元条形码描述。然而，由于引物选择和读长，短读长测序在系统发育覆盖和分类学分辨率方面遇到限制。在这些关键点上，纳米孔测序这一新兴技术，适用于长读长元条形码，由于其每次读取的错误率相对较高，因此被大大低估。在这里，我们比较了模拟群落中的原核群落结构和来自两个对比红树林地点的 52 个沉积物样本，通过 Illumina 测序（MiSeq，V3）分析的 16SV4-V5 标记（约 0.4kpb）上的短读段进行描述，与描述的那些通过 Oxford Nanopore (MinION, R9.2) 分析细菌近乎完整的 16S (ca. 1.5 kpb) 的长读长。短读和长读从模拟中检索了所有细菌属，尽管两者都显示出与等待的比例相似的偏差。从沉积物样本中，通过基于覆盖范围的读数稀疏化，并在单例过滤后，协同惯性和 Procrustean 测试表明，从短读长和长读长推断出的细菌群落结构显着相似，显示了位点之间的可比对比度和场地内海陆方向一致。在我们的数据集中，与长读检测到的短读相比，84.7% 和 98.8% 的短读分别被严格分配到相同的物种和属。长 16S 的引物特异性使其能够检测到短 16SV4-V5 检测到的 309 个科中的 92.2% 和 448 个属中的 87.7%。长读长记录了短读长未检测到的 973 个额外分类群，其中 91.7% 被鉴定为属等级，其中一些属于 11 个专属门，尽管仅占长读长总数的 0.2%。

微生物元条形码、环境DNA、方法、引物、多样性

Submission: posted 07 June 2023, validated 08 June 2023
Recommendation: posted 12 October 2023, validated 13 October 2023

Cite this recommendation as:
Spor, A. (2023) ONT long-read sequencing and Illumina short-read sequencing of 16S rDNA amplicons give comparable results in terms of bacterial community structure in marine sediments. Peer Community in Microbiology, 100014. https://doi.org/10.24072/pci.microbiol.100014

Recommendation

ONT long-read high-throughput sequencing is not routinely used for metabarcoding studies of microbial communities. Even though this technology is supposed to considerably improve phylogenetic coverage and taxonomic resolution, it initially suffered from relatively poor read accuracy. Assessment of the performance of this new approach in comparison with routinely used 16S rDNA short-read sequencing is therefore needed to validate its use.

The study by Lemoinne et al. (2023) offers a comprehensive comparison of two 16S rDNA metabarcoding approaches on marine sediment samples. By comparing Illumina short-read sequencing with ONT long-read sequencing, the authors conclude that bacterial community structures inferred from both technologies were similar. They also found that differences observed between sampling sites and along the sea-land orientation were comparable between the two technologies. However, the choice of technology still has an impact on the obtained results, notably in terms of bacterial diversity retrieved, taxonomic resolution, and replicability between biological replicates.

Altogether, these results validate the use of ONT long-read sequencing for 16S metabarcoding approaches in marine sediments. Comparisons of such kinds targeting other remote environments are needed, as they might offer new opportunities for field scientists with no access to sequencing platforms to study the structure and composition of microbial communities.

Reference

Lemoinne, A., Dirberg, G., Georges, M., & Robinet, T. (2023). Fine-scale congruence in bacterial community structure from marine sediments sequenced by short-reads on Illumina and long-reads on Nanopore. biorXiv, version 3 peer-reviewed and recommended by Peer Community in Microbiology. https://doi.org/10.1101/2023.06.06.541006

PDF recommendation

Conflict of interest:
The recommender in charge of the evaluation of the article and the reviewers declared that they have no conflict of interest (as defined in the code of conduct of PCI) with the authors or with the content of the article. The authors declared that they comply with the PCI rule of having no financial conflicts of interest in relation to the content of the article.

Funding:
Office Français de la Biodiversité (OFB) and BOREA lab (MNHN)

Reviews

Evaluation round #1

DOI or URL of the preprint: https://doi.org/10.1101/2023.06.06.541006

Version of the preprint: 1

Author's Reply, 12 Sep 2023

Download author's reply Download tracked changes file https://doi.org/10.24072/pci.microbiol.100014.ar1

Decision by Aymé Spor, posted 17 Jul 2023, validated 18 Jul 2023

Dear authors,

Two reviewers have now evaluated your manuscript and given useful comments for improving its quality.

I suggest you take all the reviewers' comments into account for providing a revised version of the manuscript. More specifically, I would ask you to particularly take into account reviewer 2's comments regarding your experimental design (confounding factors between technologies and primers choice) and elaborate upon this in the revised version.

Sincerely yours,

Aymé Spor

https://doi.org/10.24072/pci.microbiol.100014.d1

Reviewed by anonymous reviewer 1, 30 Jun 2023

Dear Authors,
thank you for opportunity to review your manuscript. It shows detailed comparison of two sequencing approaches with potentially large impact on microbial alpha diversity of biological samples. Higher number of taxa obtained using long read sequencing is shown after rigorous analysis. On the other hand, similarity of major patterns of community composition between short and long read approach is presented.

Although authors avoid giving strong recommendations on which method should be used, I think that presented results show valuable comparison which is useful for readers oriented on methodological papers.

I listed my line by line comments below. I noted two major issues from which one is about PCR cycling conditions (L152, L158, L173, L182) and the second is about input data into random forest analysis (L311). I ask authors to consider these and other comments below.

Based on this, I think that current version of the manuscript needs minor revisions.

L22, L25 - genera instead of genus
L27 - Here the statement can be stronger, I would omit "probably" since these are real reasons for discrepancies
L60 - please omit "works"
L62 - If there is a length limit for PacBio, it is for sure longer. I suggest to avoid specific number (as it might be obsolete soon) and mention "tens of kbp" or similar
L106, L109 - please omit parentheses at the beginning of sentences
L139 - Please specify type of ZymoBIOMICS mock sample as there are more types on manufacturer's website.
L144 - Starting from this section but also in previous parts of the manuscript, the typograhic corrections need to be applied widely. This includes 16S instead of 16s, dot as decimal point, en dashes where appropriate, English quotation marks, multiplication sign instead of x, etc.
L152, L158 - Altogether 60 cycles of amplification after primary and secondary PCR seems like a lot of cycles. The authors want to avoid PCR biases and do triplicate PCRs (L145) which is certainly good. But then DNA goes through so many cycles which increase chance for bias and contamination amplification. Could you please include reference, if there is such approach recommended?
L163 - Please include full names of sequencing kits as they are written at manufacturer's webpage.
L169 - I agree with usage of specific primers for Archaea, but I am missing explanation of such approach in Introduction or in Methods. Could you please include one sentence why archaeal primers were used in the case of Nanopore?
L173, L182 - Here amplicons for Nanopore went through 55 cycles which posses same question if it is necessary to cycle so many times.
L173, L174 - unclear meaning of values in brackets, please clarify
L177 - diluted is maybe better then reduced
L188 - "protocol from Nanopore website" There is no need to specify from which website the protocol was downloaded. Alternatively, you can include link as proper reference.
L195 - please check the number 1624, L257 and L479 mention different level of rarefaction
L198 - PRJNA985243 checked and fastq files are available together with clear labelling of individual samples
L202, L208 - the connection of ASVs and OTUs is a little bit confusing here. I understand that DADA2 was used to merge pair-end reads (L202) and maybe to correct errors. Individual sequences were then clustered at 97% threshold. If it is so, please clarify the paragraph. It was not ASV table which was clustered (L208) but rather individual sequences, right?
L207 - please correct typo in kpb
L208 - please consider to include vsearch version
L215-218 - The sentence needs clarification, its meaning is unclear.
L230 - Please check, maybe Figure 4 was meant.
L232 - Please specify core threshold. Is it >=50% in each sample, >=50% of all samples or something else?
L240-L243 - The sentence is duplicated, please correct.
Figure 2 - Please include description in figure caption that "once" means one flow-cell (L467) and "twice" means sequenced on two flo-cells (L468).
L250 - I wonder if manufacturer took into account copy number of 16S genes of individual genomes present in mock sample. Taxon with 2 copies of 16S rRNA will show higher relative abundance in final sequences than taxon with one copy. This was probably considered during mock preparation but it might be one of the explanations for changed proportions. However, I am aware that there is nicely looking barplot with even distribution of taxa.
L261 - This is side note, but Table 1 partly duplicates information in Figure 3.
L267 - I suggets to include information what is the mean abundance of 11 phyla detected exclusively in Nanopore data. This might provide an idea about size of this Nanopore-detected subcommunity.
L270 - please omit only
L276 - please consider to reorder Figure 4 and Figure 3. In the current version, Fig 3 is referenced after Fig 4.
L282 and L288 - information here is repeated, please correct
L286 - genera
L296 - L305 - For reader's reference, I suggest to include phylum names of individual orders in brackets.
L309 - Does the "species rank" means that OTUs served as input into Mantel test? Please clarify.
L311 - Genera and families are arbitrary groups and as such I am not sure if they can enter random forest. I suggest to test the same effects with OTUs which are exactly defined.
L315 - The archaeal sentence sounds a little bit vague. I suggest to include at least information on how many phyla were detected as Nanopore-only.
L323 - L325 - nicely summarised output which applies also in this manuscript
L342 - Please consider to add that another reason might be due to incomplete databases
L346 - What do you mean by maximum resolution? I feel that this sentence needs reformulating.
L359 - L361 - I understand what was meant here but I feel that this sentence needs reformulating.
L362-L364 - Nice key output of the study.
L365 - Ecology of Nanopore-only taxa can not be inferred based on the fact that the rest of core community was similar between Nanopore and Illumina. E.g. Nitrospinota detected by Nanopore might represent low-density nitrifiers with potentially high impact on N cycling in sediment.
L368-L380 - The last paragraph seems out of context, please consider mentioning portability in Introduction if you prefer to keep it. I think that manuscript has same quality even without portability section.

https://doi.org/10.24072/pci.microbiol.100014.rev11

Reviewed by anonymous reviewer 2, 11 Jul 2023

The present study compares short read sequencing with long read sequencing from ONT on environmental (marine) sediment samples. The authors conclude in this comparison that ONT works as good as Illumina with even covering more diversity. The articles writing is okay, and the findings are concisely presented. I think the study design as it is presented is however not correct, while the conclusions are partially valid (see below). I have one important methodological question and one important question related to the mock community.

Line 87-88: please give the respective references for RCA and UMI already in this sentence.

Lune 105: you may add https://doi.org/10.1093/femsec/fiac120 to the list. And I think there are also others that are more and more using it.

Line 113: The Study design description is not quite accurate: how do the authors disentangle the effects of primers from the effects of sequencing technologies? This can’t be done with this data. (except maybe with an in silico PCR and Illumina simulation on the Nanopore reads). Therefore, what the authors really compare were short amplicons with primer pair A with long amplicons with primer pair B. Since the study of Parada et al. 2015, we know that even a single nucleotide in one primer can have a tremendous effect on diversity estimates from sequencing. Here, the authors compare different primer sets, which renders a comparison of sequencing technologies not quite on the point. Therefore, it is rather a feasibility study on long read sequencing with ONT that shows, that it produces similar results as established primers for short read sequencers. A direct comparison with numbers (alpha diversity estimates) is not advised, because it is like comparing apples with oranges. Therefore, I am afraid that the aim and the writing of the manuscript needs to revised accordingly (and rather extensively).

Method: LSK109 with 10% error rate; clustering at 97% will result in spurious OTUs, even with singletons across all samples removed

Line 221: I would advise to look into the new publications from Patrick Schloss (doi: https://doi.org/10.1101/2023.06.23.546313) considering this argument, in particular for the alpha diversity estimates, since much weight is put on it in the author’s manuscript

Line 246. Relative proportions of the mock community are one thing, but not really that relevant since we are talking about compositional data. More important is the matching of OTU numbers with actual # of taxa in the Mock community and the detection of all taxa. I can imagine that there are vast differences between the amplicons. Please amend these missing results, even if they represent a weak point for Nanopore R9.

Line 254: This comparison with percentage suggest that more species are better. This is however not the case. An accurate estimate of the taxa in a given sample is important. More OTUs may for instance mean more artifacts, less true taxa.

Line 266: Nanopore detected these 11 phyla or did the primer system detect these phyla. I would argue that the primer pair detected it. ONT is just the tool to read out these sequences. I would suggest to revise this terminology by replacing “Nanopore” with e.g. “long amplicons”, which is more objective.

Line 386: I consider OTU/ASV tables with taxonomy classifications, read abundances per sample, and one representative FASTA as mandatory supplemental item. Please amend as an annotated .csv file.

Thanks!

https://doi.org/10.24072/pci.microbiol.100014.rev12