Research Article |
Corresponding author: Inmaculada Larena ( ilarena@inia.csic.es ) Academic editor: Neriman Yilmaz
© 2025 Elena Requena, Javier Veloso, Eduardo A. Espeso, Inmaculada Larena.
This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Citation:
Requena E, Veloso J, Espeso EA, Larena I (2025) Hybrid assembly of Penicillium rubens genomes unveils high conservation of genome structural organisation and the presence of Numts in nuclear DNA. IMA Fungus 16: e145175. https://doi.org/10.3897/imafungus.16.145175
|
The search for highly accurate chromosomal reference genomes has become a primary objective for the fungal research communities. Various genomic events, including insertions, deletions, inversions and movement of transposable elements, can modify the genomic architecture, resulting in chromosomal rearrangements. Long sequence reads enhance the accuracy and reliability of the assembly procedure, facilitating the study of these genomic characteristics. Here, we have utilised a combination of PacBio and Illumina sequencing technologies to generate hybrid assemblies of Penicillium rubens strains 212 (PO212) and S27. These assemblies were then subjected to a comparative analysis in order to elucidate the chromosomal rearrangements that underpin the observed genomic differences, with a particular focus on their implications in the biocontrol phenotype against phytopathogenic fungi. This approach has enabled us to obtain the assembly of both PO212 and S27 genomes, with each organised into 13 scaffolds. The genomic organisation between these two isolates is highly conserved and the presence of transposable elements between the strains does not reveal major differences. Using the hybrid assemblies, we were able to detect, for the first time in the genus Penicillium, the presence of two nuclear mitochondrial DNA segments (Numts) in the genomes of the PO212 and S27 strains. The differences in biocontrol phenotype displayed by PO212 and S27 strains are independent of their genome organisation. These genomes provide new information for the existing database repositories.
Biocontrol agent, Numts, Penicillium rubens, PO212, S27, whole genome sequencing PacBio
The last few decades have seen a technological evolution in which new sequencing methods have been developed very rapidly, making the sequencing and subsequent assembly processes easier, faster and at lower costs, with a major impact on various fields of research (
In recent years, an increasing number of genomes have undergone large-scale analyses; as a result, the number of genomes from these species deposited at NCBI now exceeds 100, albeit with different levels of assembly quality. One of the best-assembled genomes is that of P2niaD18 (GCA_000710275.1), which is organised into five scaffolds. Consistent with the four chromosomes of Penicillium notatum and P. chrysogenum observed in the pulsed field electrophoresis (
The genomic organisation of microorganisms is subject to continuous changes, as is the integration of both exogenous and endogenous DNA. The process by which exogenous DNA is integrated into the nucleus is known as horizontal gene transfer (HGT). The phenomenon of HGT has been observed from bacteria to fungi (
The origin of new DNA fragments inserted into the nuclear genome is not always extracellular. Two other cases that add variability to genome organisation are the movement of Transposable Elements (TEs) in the genome and the insertion of endogenous DNA from organelles, such as mitochondrial DNA. The latter category is known as Numts (nuclear mitochondrial DNA segments).
In conjunction with the analysis of structural genomic rearrangements in a genome, it is interesting to examine genes encoding members of large and well-characterised protein families, such as the Carbohydrate-Active enZymes (CAZymes) (
In order to increase the knowledge of structural genomic rearrangements in PO212 compared to S27, we performed PacBio and Illumina sequencing and subsequent hybrid assemblies of the genomes of both strains of P. rubens. These new data provide the basis for comparing structural variations in PO212. Using different strategies, we underscored the genetic similarity in these assemblies, by analysing a large and conserved family of proteins and the presence and number of transposable elements. Notably, we identified two mitochondrial DNA insertions in the nuclear genome in this fungal species.
The P. rubens strains, including PO212 and S27, isolated from diverse agricultural soils and plant samples in Spain, are listed in Table
Strain | Origin | Host | BA † | Reference |
---|---|---|---|---|
PO212 | Spain | Soil | + |
|
S27 | Spain (Ávila) | Soil | - |
|
S17 | Spain (Segovia) | Soil | - |
|
S71 | Spain (Segovia) | Soil | - |
|
S73 | Spain (Segovia) | Soil | + |
|
CH2 | Spain (Madrid) | Leaf of a perennial plant | - ‡ |
|
CH5 | Spain (Madrid) | Shoot of a perennial plant in a field of peach trees | - ‡ |
|
CH6 | Spain (Madrid) | Deep soil sample in the field of peach trees | + ‡ |
|
CH8 | Spain (Madrid) | Shoot of a perennial plant in a pine forest | + ‡ |
|
CH16 | Spain (Lérida) | Outbreak of pruning shoot | + ‡ |
|
For total DNA extraction of PO212 and S27 strains, cultures were grown in liquid minimal medium (MM;
Macrogen (Korea) performed PacBio and Illumina sequencing. Integrity of genomic DNA was analysed and DNA libraries were constructed on SMRT (single molecule real time sequencing) Library (20 kb). Sequencing procedure was carried out on SMRT Cell Run in Sequel I. For error correction, library construction was performed with Nextera DNA XT and sequencing in an Illumina NovaSeq6000 platform using 150 base pairs (bp) paired-end sequencing reads. The DNA sequencing data analysed in this study were deposited at the National Center for Biotechnology Information (NCBI) under BioProject PRJNA887566 Assembly and Scaffolding.
Reads obtained from sequencing were subjected to quality analysis before proceeding with the assemblies. The assemblies of PO212 and S27 genomes were conducted using Flye (v.2.4.2) software, employing the default parameters (minimum overlap auto (3k–10k), with a 20% error, Haplotype mode activated, one polishing by default) (
After initial assemblies were obtained and errors corrected using short reads from Illumina sequencing, a manual curation was performed to reduce the number of scaffolds while improving the assemblies using two different strategies. The first strategy consisted of the comparison of sequences of the 5’ and 3’ ends of scaffolds that were predicted to overlap, based on the information provided by the dot-plot. As an example, scaffolds PO212_1 (previously reverse complemented) and PO212_11 were joined following this strategy. The second strategy involved the search for overlapping regions between the sequences of the 5’ and 3’ ends of all scaffolds, despite the absence of evidence in the dot-plot. For instance, following this approach, we assembled scaffold PO212_3 with scaffold PO212_17. To be considered for merging, two scaffolds had to overlap at least 1,500 bp. Upon identifying overlaps, the fasta sequence files were manually reconstructed and specific oligonucleotides were designed to flank these reassembled regions.
In order to evaluate the quality of these manually merged regions, raw reads were mapped against each genome and the final assembled version of scaffolds was verified experimentally using PCR techniques. The oligonucleotides utilised in these amplifications are enumerated in Suppl. material
A Telomere Identification toolKit (tidk) was used to identify telomeric repeats (
In order to obtain the assembled mitochondrial genome for the PO212 and S27 strains, Illumina raw reads were mapped to the 27 kb P2niaD18 mitochondrial genome (GCA_000710275.1) (
A comparison of PO212 and S27 assemblies for determining sequence similarity was conducted using CLC Genomics Workbench 22.0.2. Furthermore, sequence homology was determined using BLAST with a cut-off E value of 1E-10. The use of homologous loci facilitated the mapping of the relative positions and directions of scaffolds along the PO212 and S27 scaffolds. Sequences that exhibited at least 30 consecutive homologous loci were considered homologous regions. The coordinates of these homologous regions were then merged and flattened using a custom Python script and visualised using Circos v. 0.69 (
RepeatMasker version 4.0.7 (
To predict the genes contained within genome assemblies, the homology-based gene prediction programme, Gene Model Mapper (GeMoMa) (
A phylogenetic tree was generated using the Maximum Likelihood method and the Tamura-Nei model with 1,000 replicates (
AA Auxiliary activity
BA Biocontrol activity
CAZymes Carbohydrate-active enzymes
COG Clusters of orthologous group
DSBs Double-strand breaks
ITS Internal transcribed spacer
MM Minimal medium
NHEJ Non-homologous end joining
ORF Open reading frame
PDA Potato dextrose agar
SNV Single nucleotide variation
TEs Transposable elements
PacBio and Illumina technologies were utilised to sequence the genomes of P. rubens strains PO212 and S27, facilitating genomic analysis given the interest in the BA of PO212. The result of the automatic assembly process yielded 23 and 20 scaffolds, respectively (Fig.
Data from hybrid assemblies followed by manual curation. Number of scaffolds, length and ORFs in the PO212 and S27 genomes are shown.
PO212 | S27 | ||||||||
---|---|---|---|---|---|---|---|---|---|
Manually merged scaffold † | Overlapping regions (kb)‡ | Final scaffold number | Size (bp) | N# predicted ORFs | Manually merged scaffold | Overlapping regions (kb)‡ | Final scaffold number | Size (bp) | N# predicted ORFs |
1+11+3+17 | 4.6/2.5/4.0 | 1 | 9,478,687 | 3,244 | 1+9+11 | 3.1/3.4 | 1 | 9,488,521 | 3,252 |
12+6+8 | 4.8/3.5 | 2 | 4,548,765 | 1,537 | 3 † | - | 2 | 4,123,581 | 1,391 |
2+10 | 4.7 | 3 | 4,107,425 | 1,403 | 4+5 | 1.8 | 3 | 2,083,268 | 717 |
15+4 | 1.6 | 4 | 3,513,604 | 1,211 | 6 † | - | 4 | 1,895,371 | 642 |
7+13+14 | 4.5/3.2 | 5 | 3,324,254 | 1,151 | 7 † | - | 5 | 2,933,455 | 1,019 |
5 † | - | 6 | 2,396,790 | 822 | 8 † | - | 6 | 5,740,421 | 1,974 |
9+16 | 4.0 | 7 | 1,873,659 | 666 | 10 † | - | 7 | 1,884,646 | 671 |
18 † | - | 8 | 355,628 | 133 | 12 † | - | 8 | 592,447 | 199 |
19 † | - | 9 | 110,255 | 34 | 14+13 | 2.1 | 9 | 445,038 | 149 |
20 † | - | 10 | 74,744 | 20 | 15+2 | 2.2 | 10 | 471,198 | 171 |
21 † | - | 11 | 56,873 | 17 | 16 † | - | 11 | 128,065 | 44 |
22 † | - | 12 | 47,614 | 18 | 17 † | - | 12 | 81,615 | 30 |
23 † | - | 13 | 6,144 | 1 | 18 † | - | 13 | 74,708 | 20 |
Manual curation of the PO212 and S27 assemblies: A whole-genome dot-plot between P. rubens strains 212 (PO212) and S27 using version 1 of the assemblies before manual curation. The PO212 assembly (x-axis) was utilised as the reference. Green circles mark the start of scaffold 1. Red circles indicate the start of the scaffolds designed for merging. The blue boxes on the x- and y-axis correspond to the following: *1: S27 scaffolds ordered from 9 to 20 and *2: PO212 scaffolds ordered from 9 to 23; B scheme of the overlapping region between the end of one scaffold and the start of another, together with the position of the designed oligonucleotides; C PCR product to verify the continuity of the regions in the PO212 and S27 strains. P indicates PO212 genomic DNA and S indicates S27 genomic DNA. The order from left to right is PO212 (2+10 = 5.5 kb, 1+11 = 5.3 kb, 13+14 = 4 kb, 6+8 = 4.3 kb, 7+13 = 4.9 kb, 9+16 = 5 kb, 3+17 = 4.9 kb, 4+15 = 2.3 kb, 12+6+8 = 5.6 kb and 1+11+3+17 = 5.5 kb). S27 (14+13 = 3.2 kb, 4+5 = 4.2 kb, 15+2 = 4.2 kb, 1+9 = 6.7 kb and 9+11 = 5.3 kb). Mw: Molecular weight marker. The black arrows indicate the band sizes of 5, 3 and 2 kb from top to bottom; D whole-genome dot-plot between strains PO212 and S27 using version 2 of assemblies after manual curation. The PO212 assembly (x-axis) was utilised as a reference. The green circle indicates the start of scaffold 1 in both genomes. The blue boxes on the x- and y-axis correspond to the following: *3: S27 scaffolds 7, 10, 11, 12 and 13 and *4: PO212 scaffolds 7, 8, 9, 10, 11, 12 and 13.
Utilising the hybrid assemblies, we analysed the presence of differences in structural genomic organisation between both strains. Features, sizes and other data from the newly-assembled genomes and those of strains P2niaD18 and Wisconsin 54-1255 are shown in Table
Genomic characteristics of the newly-sequenced strains PO212 and S27 together with P2niaD18 and Wisconsin 54-1255.
Assemblies | PO212 | S27 | P2niaD18 | Wisconsin 54-1255 |
---|---|---|---|---|
GenBank code | (GCA_027256995.2) | (GCA_027257005.2) | (GCA_000710275.1) | (GCA_000226395.1) |
Size (Mb) | 29.89 | 29.94 | 32.52 | 32.22 |
# Scaffolds | 13 | 13 | 5 | 49 |
Largest scaffold (Mb) | 9.47 | 9.48 | 13.59 | 6.38 |
N50 | 4.1 | 5.74 | 10.45 | 3.9 |
L50 | 3 | 2 | 2 | 4 |
GC content % | 49,02 | 49.02 | 48.95 | 48.96 |
# Gene | 10,257 | 10,279 | 11,839 | 12,943 |
Synteny amongst the S27 genome (left) and the PO212 genome (right). The scaffolds corresponding to the PO212 strain have been delineated in 13 different colours on the right and are labelled as in the genome assembly. The external track, situated in close proximity to the scaffolds, displays the quantity of telomeric repeats TTAGGGC. The scale utilised for quantifying the number of repeats ranges from 0 to 20, with the green-coloured line denoting seven repeats in each telomeric repeat search window/range. The grey bars represent a track indicating the number of repetitive elements in 120 kb ranges. On the left, the S27 scaffolds are linked to the PO212 scaffolds by lines representing regions of high sequence similarity. High sequence similarity regions have at least 30 consecutive genes homologous to PO212. In black are labelled those S27 scaffolds whose nt sequences are in the same direction as the aligned PO212 sequence, while those in white are in the inverted direction as the aligned PO212 sequence. Numerical labels indicate the nucleotide of the break junction between rearranged regions.
In contrast to the conserved genomic structure between PO212 and S27, dot-plots of PO212-P2niaD18 and PO212-Wisconsin 54-1255 assemblies, showed notable differences at the level of organisation between them, indicating the presence of genomic rearrangements between these assemblies (Fig.
Synteny between the P2niaD18 genome (left) and the PO212 genome (right) (A, C) and between the Wisconsin 54-1255 genome (left) and the PO212 genome (right) (B, D): A whole-genome dot-plot between P. rubens strain 212 (PO212) and P2niaD18 genome (GCA_000710275.1). The PO212 assembly (x-axis) was utilised as a reference; B whole-genome dot-plot between P. rubens strain 212 (PO212) and Wisconsin 54-1255 genome (GCA_000226395.1). The PO212 assembly (x-axis) was utilised as a reference; C scaffold ordering of the P2niaD18 genome (left) to the scaffolds of the PO212 genome (right); D scaffold ordering of the Wisconsin 54-1255 genome (left) to the scaffolds of the PO212 genome (right). The PO212 scaffolds are illustrated on the right in 13 different colours, with the labelling corresponding to that employed in the genome assembly. The grey bars correspond to a track showing the number of repetitive elements in 120 kb ranges. On the left, the scaffolds of P2niaD18 (C) and Wisconsin 54-1255 (D) are linked to the PO212 scaffolds by lines representing regions of high sequence similarity. Numerical labels indicate the nucleotide of the break junction between rearranged regions.
In order to identify any discrepancies between the PO212 and the S27 assemblies, a search for such discrepancies was conducted using gene prediction and functional annotation of proteins through the utilisation of the OmicsBox platform. The predicted proteomes were organised into 25 clusters, which were subsequently classified into four main groups according to the distribution of the Clusters of Orthologous Group (COG) categories: Metabolism, Cellular Processes and Signalling, Information Storage and Processing and Poorly Characterised (Fig.
A comparative analysis of the identified proteins in the genomes of three P. rubens strains, PO212, S27 and P2niaD18, was conducted: A category abundance of proteins grouped into 25 functional categories using COG classification in PO212, S27 and P2niaD18; B enzyme distribution of identified proteins, based on EC class in the annotated genomes of the P. rubens strains PO212, S27 and P2niaD18.
The search for CAZymes was performed using the automated CAZymes annotation web server dbCAN 3 (
Using the CAZymes of PO212 and P2niaD18 as a means to examine genetic variability, a comparison was made between these protein sets in the two strains (Suppl. material
A Phylogenetic relationship of the genes encoding mutanases from PO212 and P2niaD18 using MEGA11. The phylogenetic tree was built using six mutanases encoding genes from PO212 (PO212g058370, PO212g085410, PO212g101870, PO212g096930, PO212g013760 and PO212g033560) (red) and the corresponding genes encoding mutanases from P2niaD18 (KZN91491, KZN92410, KZN92305 and KZN94461) (green). The sequence of a glycosyl transferase was utilised as a root (PO212g014370) (purple). Bootstrap values are indicated adjacent to the nodes. The tree was inferred by Maximum Likelihood. The tree has been drawn to scale, with branch lengths measured as the number of substitutions per site. The codon positions encompass in the analysis included first+second+third+non-coding positions. The final dataset comprised 4060 positions; B schematic representation of PO212g085410 mutanase and the homologue in P2niaD18 (KZN92410); C PO212g101870 mutanase and the homologue region in P2niaD18; D PO212g096930 mutanase and the homologue in P2niaD18 (KZN92305); E PO212g013760 mutanase and the homologue region in P2niaD18; F PO212g033560 and the homologue in P2niaD18 (KZN94461); G PO212g058370 and the homologue in P2niaD18 (KZN91491). The flanking and homologous genes between PO212 and P2niaD18 are coloured in grey. Red colour indicates PO212 mutanases. Blue colouration indicates the single genes for PO212. P2niaD18 mutanases are represented in green colour. The genes that are unique to P2niaD18 are shown in yellow. The prefix PO212 has been removed from the nomenclature of the genes in order to facilitate the legibility of the figure. The genes are to be initiated with the letter “g”.
The incorporation of the CAZymes proteins of P2niaD18 in the analysis demonstrated the high conservation amongst the proteomes of the strains and further reinforced the strong resemblance between PO212 and S27 strains (Suppl. materials
The search for differences between these two very similar strains, PO212 and S27, continues by the detection of single nucleotide variations (SNVs). For this, short (Illumina) and filtered reads (reads left after removing adapters, filtering by quality, removing duplicates and filtering by length) from S27 were mapped to the final version of the PO212 genome. We detected 161 variations between the two genomic sequences. Of these, 125 were localised in non-coding regions. Of the 36 variations in ORFs, only 25 would predictably cause an amino acid change. The frequency of nucleotide changes (the number of times a base is represented at a given position, with respect to the total number of reads that map to that position) in these 25 variations ranged from 65.31% to 100%. Of these, the frequency of the occurrence of the nucleotide alteration was found to be 100% in 10 of the variations and 98.95% in the remaining one.
The same results were obtained when the reverse analysis was performed, mapping the PO212 reads to the S27 reference genome. Confirmation was thus provided for the presence of six of the 11 variations already described in a previous study (Table
Variations showed a 100% frequency of the single nucleotide variants (SNVs) between the PO212 and S27 genomes.
Code in PO212 | Code in S27 | Triplet PO212/S27 (5’- 3’) | Amino acid change |
---|---|---|---|
PO212g022060 | S27g022120 | TGC/GGC | Cys605Gly† |
PO212g031060 | S27g031140 | GTC/ATC | Val354Ile |
PO212g034660 | S27g033250 | GGT/AGT | Gly135Ser† |
PO212g041380 | S27g039990 | CGA/TGA | Arg327†, ‡ |
PO212g046560 | S27g045170 | CCT/CAT | Pro219His† |
PO212g052230 | S27g050830 | CTG/CCG | Leu283Pro |
PO212g052600 | S27g051200 | GAT/GGT | Asp1447Gly† |
PO212g055060 | S27g053660 | CAC/CGC | His949Arg |
PO212g063880 | S27g062120 | TGC/GGC | Cys86Gly |
PO212g077520 | S27g081900 | ACT/GCT | Thr19Ala |
PO212g079090 | S27g083470 | TCC/CCC | Ser271Pro† |
Strain | BA † | g031060§ | g052230§ | g063880§ | g055060§ | g077520§ |
---|---|---|---|---|---|---|
PO212 | + | G | T | T | A | A |
S27 | - | A | C | G | G | G |
S17 | - | A | C | G | G | G |
S71 | - | A | C | G | G | G |
S73 | + | A | C | G | G | G |
CH2 | -‡ | A | C | G | G | G |
CH5 | -‡ | A | C | G | G | G |
CH6 | +‡ | A | T | G | G | A |
CH8 | +‡ | A | C | G | G | G |
CH16 | +‡ | A | T | G | G | A |
Following a thorough analysis of the structural genome organisation, which revealed elevated levels of structural conservation and the absence of significant rearrangements, a repetitive sequence analysis was conducted on the PO212 and S27 assemblies. These results were then compared with a similar analysis of the P2niaD18 and Wisconsin 54-1255 assemblies (Fig.
Two regions with a very high number of reads were found when mapping the PO212 and S27 reads to their own assemblies (Fig.
Identification of Numts in Penicillium rubens: A PO212 and S27 raw reads were mapped to the PO212 genome in scaffold 2 between coordinates 4,488,480–4,498,480; B reads from PO212 and S27 sequencing that did not map to the mitochondrial genome of P2niaD18, were mapped to the PO212 genome in scaffold 2 between coordinates 4,488,480–4,498,480; C schematic representation illustrates the oligonucleotides mapping and predicted amplicon size. Green circles indicate the gene number: 1- PO212g047600, 2- PO212g047610, 3- PO212g047620, 4- PO212g047630. IGV images; D the PCR product of PCR 1 (Mit 1F and Mit 1R) and PCR 2 (Mit 2F and Mit 2R). The amplified fragment is 4.3 kb for PCR 1 and 2 kb for PCR 2, corresponding to the predicted size of the amplicon in both strains, PO212 and S27. C- Corresponds to the negative control of the PCR. Mw: Molecular weight marker; E PCR product of PCR 3 (Mit 1F and Mit 2R) and PCR 4 (same oligonucleotides as PCR 3). The amplified fragment for PCR 3 is greater than 10 kb for both strains, PO212 and S27. No PCR product was detected in PCR 4. C- Corresponds to the negative control of the PCR. Mw: Molecular weight marker; F PCR product of PCR 3 in other P. rubens DNA strains, where an amplicon greater than 10 kb can be observed for all strains except CH5 (2.5 kb).
In order to verify the mitochondrial nature of these two regions, the sequencing reads from both, the PO212 and S27 strains, were mapped to the P2niaD18 strain mitochondrial genome (GCA_000710275.1) as the best-assembled genome of the P. rubens clade. Reads that failed to map to the P2niaD18 mitochondrial genome were subsequently mapped to the PO212 genome. Following the elimination of mitochondrial reads, two gaps, one measuring 10 kb and the other 500 bp, were identified in the alignment of the reads to the PO212 assembly (Fig.
This largest Numt (10 kb) is found in PO212 and S27. In PO212, it is located in scaffold 2 between coordinates 4,488,480–4,498,480 and, in the S27 assembly, between coordinates 4,048,300–4,058,300 of scaffold 2. The flanking genes of this region are the PO212g047600 (1) and PO212g047610 (2) genes on one side and the genes PO212g047620 (3) and PO212g047630 (4) on the other side. The equivalent flanking genes in S27 are S27g046190 and S27g046200 on one side, S27g046210 and S27g046220 on the other side. Due to the proximity of the flanking gene PO212g047610 and its homologue in S27 (S27g046200) to the inserted mitochondrial region, putative homologues were identified in P2niaD18 (KZN83398) and Wisconsin 54-1255 (Pc16g00290). The PO212g047610, S27g046200 and Pc16g00290 gene models, predicted a different translation start site than KZN83398. In the latter, the ATG is predicted to occur at an upstream position, resulting in the addition of 103 amino acids to the expected protein sequence encoded by the remaining three loci. Analysis of putative homologues in other Penicilli gives support to the hypothesis that this second ATG could also function as a translation initiation site for these gene models.
The second Numt of 500 bp detected is located on scaffold 5, between coordinates 3,232,580–3,233,080 in the PO212 assembly. Genes PO212g085220 (5) and PO212g085230 (6) are located on one side of the Numt, whilst PO212g085240 (7) and PO212g085250 (8) are located on the other side (Suppl. material
The presence of these two Numts was verified by PCR in other P. rubens strains used in this study (Table
Following the confirmation of the presence of Numts in the PO212 and S27 genomes, the P2niaD18 and Wisconsin 54-1255 genomes were subjected to further analysis. However, due to the unavailability of these strains in the laboratory, it was not possible to carry out PCR analysis. Consequently, a genome check was performed as an alternative. For this approach, unmapped PO212 and S27 reads with the P2niaD18 mitochondrial genome, were mapped to the P2niaD18 and Wisconsin 54-1255 genomes. As demonstrated in Fig.
Coverage of PO212 and S27 reads on the P2niaD18 and Wisconsin 54-1255 assemblies: A alignment of PO212 and S27 reads to the P2niaD18 genome, on chromosome IV spanning coordinates 71,228–85,548. The gene number is indicated by the colour green (from left to right): 4- KZN83396, 3- KZN83397, 2- KZN83398, 1- KZN83399; B PO212 and S27 reads mapped to the Wisconsin 54-1255 genome in AM920431.1:55,537–73,456. Green circles are used to indicate the gene number (from left to right): 4- Pc16g00260, 3- Pc16g00270, n- Pc16g00280, 2- Pc16g00290 and 1- Pc16g00300. IGV images; C schematic representation of the organisation of this region in the PO212, S27, P2niaD18 and Wisconsin 54-1255 genomes.
The advent of sequencing technologies (
In a previous study (
The genome organisation presented in the synteny diagram between PO212 and S27 assemblies does not show any reorganisations between the genomes, such as translocations or inversions within scaffolds. The breakpoints observed between the two genomes could not be joined due to the parameters imposed during the assembly procedures. Regions exhibiting low read coverage were found to be inadequate in providing sufficient evidence for the closure of gaps between scaffolds. However, further analysis in these regions confirmed that there was no loss of information between the two assemblies. By contrast, a high degree of genome rearrangement is observed when the PO212 assembly is compared with those of P2niaD18 and Wisconsin 54-1255. This high degree of rearrangement is likely linked to the mutagenesis process used to enhance penicillin production (
With regard to the presence of repetitive sequences and TEs, the comparison between PO212 and S27 analysis demonstrated minimal disparities, suggesting a high degree of similarity in the quantity of TEs between the two strains under study. However, a greater discrepancy was observed in the predictions of P2niaD18 and Wisconsin 54-1255. This discrepancy can be attributed, at least in part, to the geographical origins of the industrial strains. The distribution of TEs seems to be homogeneously dispersed along the scaffolds, despite the existence of varying levels of preference for their insertion within the genome, as highlighted by
In the context of ongoing research investigating the potential for endogenous DNA to undergo modification, the Numts were identified. Sequencing from long reads, as enabled by the PacBio’s technology and not removing the mitochondrial reads before assembling the genomes, enable the detection of Numts, defined as segments of mitochondrial DNA that have been inserted into the nuclear genome (
A prediction of the CAZymes was made from the current assemblies. This group is quite large and can provide insight not only into the annotation of carbohydrate-active enzymes, but also into the strong resemblance between the two strains studied. The search for discrepancies between the predictions of the two assemblies returned no results after the verification of the artefacts by the predictors. In this study, we analysed some differences that we have identified as artefacts between PO212 and S27. These predictions are essential for analysing an organism’s capacity for encoding different proteins. Nevertheless, it is imperative to verify the automated predictions. Failure to review such predictions can lead to the accumulation of errors in subsequent processes (
A slight increase in the number of predicted genes was observed in comparison with previous assemblies, which may be attributable to a reduced fragmentation of the genomes by long read sequencing (
The employment of long sequence technologies and assembly work have facilitated the acquisition of two assembled genomes of P. rubens strains, thereby augmenting the compendium of genomes stored in the databases. The PO212 and S27 assemblies are distinguished by their remarkable genomic conservation, exhibiting no discernible structural rearrangements, despite the spatial and temporal variations in the strains obtained from soil samples. The analysis of these assemblies revealed the presence of transposable elements dispersed throughout the genome, thereby underscoring the genetic similarity between the two study strains. In addition, the presence of mitochondrial sequences inserted into the nuclear genome of both strains was observed. The integration of these sequences appears to be a unique event, as these and other strains analysed show integration in the same region of the genome, being the first time this has been observed in the genus Penicillium. Extending this analysis to other strains could provide further insights into the insertion process of these mitochondrial sequences into the nuclear genome. On the other hand, the CAZymes prediction highlights the remarkable similarity of these two strains with different activity in the biological control of plant diseases and underlines the need to verify the differences. Further analysis including transcriptomic studies and intergenic variations, would enhance our understanding of the BA of PO212.
The authors wish to thank Y. Herranz and M. Villarino for their support and collaboration at the INIA-CSIC.
The authors have declared that no competing interests exist.
No ethical statement was reported.
All the fungal strains used in this study have been legally obtained, respecting the Convention on Biological Diversity (Rio Convention).
RTA2017-00019-C03-01(Plan Nacional de I+D, MICIU, Spain) and PID2021-123594OR-C21 (MCIN/AEI/10.13039/501100011033/FEDER, UE). E. Requena received a scholarship (PRE2018-086890) from the MICIU associated to grant RTA2017-00019-C03-01. ER has also received a postdoctoral contract that is associated to grant PID2021-123594OR-C21.
EE and IL conceived and designed the experiments. ER performed the experiments. ER and JV carried out the bioinformatics analysis. ER analysed the data. ER, EE and IL wrote the original draft manuscript. All authors contributed to the article and approved the submitted version.
Elena Requena https://orcid.org/0000-0001-5028-8880
Javier Veloso https://orcid.org/0000-0002-7283-769X
Eduardo A. Espeso https://orcid.org/0000-0002-5873-6059
Inmaculada Larena https://orcid.org/0000-0001-8424-8916
The datasets generated for this study can be found in online repositories (https://www.ncbi.nlm.nih.gov). Nucleotide sequence data reported are available in the NCBI databases under submission numbers: JAPDLE000000000.2, JAPDLD000000000.2.
Identification of Numts in P. rubens
Data type: png
Identification of inserted mitochondrial regions into genomic DNA
Data type: png
Mapped reads of different parts of the PO212 and S27 genomes
Data type: png
Oligonucleotides designed in this work
Data type: xlsx
Prediction of PO212, S27 and P2niaD18 CAZymes using DIAMOND and HMMER databases
Data type: xlsx
Comparison of CAZymes between PO212 and S27 genomes
Data type: xlsx
Comparison of CAZymes between PO212 and P2niaD18 genomes
Data type: xlsx
Repetitive elements for PO212 strain
Data type: xlsx
Repetitive elements for S27 strain
Data type: xlsx
Repetitive elements for P2niaD18 strain
Data type: xlsx
Repetitive elements for Wisonsin 54-1255 strain
Data type: xlsx