The sequence and de novo assembly of the wild yak genome (2024)

Background & Summary

The yak, which can survive extremely cold, harsh and oxygen-poor conditions, is endemic to the Qinghai-Tibetan Plateau (QTP) and adjacent high-altitude regions1. Yak were domesticated by nomadic people from wild yak at least 7300 years ago2. Nowadays, more than 22 million domestic yak (Bos grunniens) provide necessities, such as food, transport, shelter and fuel, for Tibetans and other humans in high-altitude areas1. In addition, there are still 15–20 thousand wild yak (Bos mutus) surviving in northwestern parts of the QTP3,4. Due to long-term over-breeding and inbreeding caused by traditional yak breeding practices, the reproductive capacity, growth rate, adult size and milk production of domestic yak have declined and mortality has increased, especially among the newborn and young1. However, Datong yak, the only artificially cultivated yak breed that is a cross between wild yak and domestic yak, shows excellent growth characteristics and production performance. Datong yak are generally 30 and 50% heavier than domestic yak at birth and six months of age, respectively, produce more than 15% milk, and have 25 and 31% higher carcass weights at ages of 6 and 18 months, respectively5,6. The high growth, development and production rates of the Datong yak show the feasibility of improving traits of domestic yak with wild yak resources, and potential importance of exploiting wild yak genetic resources for yak breeding in the future. However, the wild yak genome has not been previously sequenced, which has impeded both research and breeding efforts.

Thus, to elucidate genomic features of this vulnerable species, we have constructed a draft genome for wild yak. We extracted genomic DNA from blood tissues, constructed 3 Paired-End (PE) and 3 Mate-Pair (MP) libraries, which were sequenced using the Illumina HiSeq. 2000 platform. After quality filtering and trimming of raw data, Genome Characteristics Estimation (GCE, v1.0.0)7 software was employed to evaluate the genome size using PE reads, and Platanus v1.2.48 to assemble the genome using all clean data. In addition, GapCloser v1.129 was used to perform another round of gap closure based on the assembly results. The final genome assembly size was 2.83 Gb, containing 808,541 contigs (N50 = 63.2 kb) and 734,073 scaffolds (N50 = 16.3 Mb), representing 91.5% of the estimated genome. Structural annotation of the genome yielded 22,910 genes, 90.18% of which could be functionally annotated with at least one of the five protein databases (TrEMBL, SwissProt, InterPro, GO and KEGG). The wild yak genome assembled in this study provides a valuable genetic resource for future efforts to protect the vulnerable wild yak and further comparative analysis of genome biology among bovine species to promote breeding research.

Methods

Sample collection, library construction and sequencing

Genomic DNA was extracted from a blood sample collected from a female wild yak originally captured from the wild and reared at the Datong Yak Farm of Qinghai Province (37°15′35.6″N, 101°22′24.0″E, altitude around 3200 m) using a standard phenol/chloroform method. The quality and integrity of the extracted DNA were checked by measuring its A260/A280 ratio and agarose gel electrophoresis. For paired-end libraries with insert sizes of 280, 500 and 800 bp, 6 μg portions of genomic DNA were used to generate the corresponding libraries using Illumina TruSeq DNA Nano Preparation Kit (Illumina, San Diego, CA, USA). For mate-pair libraries with insert sizes of 2, 5 and 10 kb, 60 μg portions of DNA were used for circularization and further library construction using Nextera Mate Pair Library Preparation Kit (Illumina, San Diego, CA, USA). Both the sample collection and experimental library construction protocol were approved by the Ethical Committees of Lanzhou University. All libraries were sequenced on an Illumina HiSeq. 2000 platform with 150 bp read length, following the manufacturer’s instructions. Finally, 760.85 Gb of raw data were generated in total (Table1).

Full size table

Preprocessing and genome size estimation

All the sequencing reads were preprocessed for quality control and filtered with stringent criteria using Lighter v1.1.110 software. Firstly, raw data were filtered by removing reads with >10% unknown bases. Then, paired reads with low-quality bases (quality scores ≤7) covering more than 65% of the read length were filtered out. Reads with PCR duplicates or adapter contamination were also removed. Finally, both read 1 and read 2 files were filtered out if they had >10 bp overlap, allowing 10% mismatch. In total, 662.3 Gb of clean reads were obtained after filtering (Table1).

Prior to genome assembly, all the preprocessed sequences from the short-insert library were subjected to genome size estimation using Genome Characteristics Estimation (GCE) with a k value of 21. The genome size of wild yak was estimated to be around 3.09 Gb, using the following formula: genome size = k-mer number/k-mer depth, where the k-mer number refers to the total number of k-mers, and k-mer depth is the depth of the main peak in the k-mer frequency distribution (Fig.1).

21-mer distribution in the wild yak genome.

Full size image

Genome assembly

For de novo genome assembly, Platanus software was used for constructing contigs and scaffolds with default parameters, and GapCloser was employed to fill the remaining gaps in the scaffolds with all sequencing reads. These steps finally yielded a wild yak draft genome with a total length of 2.83 Gb, accounting for 91.5% of the estimated genome size (contig and scaffold N50 sizes: 63.2 kb and 16.3 Mb, respectively) (Table2).

Full size table

To evaluate the completeness of our assembly, we carried out BUSCO11 analyses and the results indicated that 3,974 of the 4,104 conserved single-copy genes in mammals were present in our assembly, of which 3,799 were single, 55 were duplicated, and 120 fragmented matches (Table3). To validate the single-base accuracy of the genome assembly, we aligned the high-quality reads of short-insert libraries to the assembly using Burrows-Wheeler Aligner (BWA, v0.7.15-r1140)12 software, and the alignment outputs were converted to Binary Alignment Map (BAM) format via SAMtools v1.313. The genome coverage was then calculated by a custom Perl script, which indicated that more than 93.9% of the assembly had >20-fold coverage.

Full size table

Repeat annotation

Repetitive regions of the wild yak genome were identified using a combination of de novo and hom*ology-based approaches, as applied in a previous analysis of the Ovis ammon polii genome14. For the de novo prediction, RepeatModeler v1.0.11 was employed first to construct a de novo repeat library, then RepeatMasker v4.0.715 was used to identify repeats using both the RepBase16 library of known transposable elements (TEs) and a self-trained repeat database. Next, we applied RepeatProteinMask (a package in RepeatMasker) to identify repeats at the protein level using the TE protein database. In addition, tandem repeats were further annotated using Tandem Repeat Finder (TRF, v4.0.9)17. Finally, the non-redundant repeats were checked according to their coordinates in the genome. Overall, we identified 1.41 Gbp of non-redundant repetitive sequences, representing 49.65% of the wild yak genome assembly; of which long interspersed elements (LINE) were the most abundant, accounting for 35.98% of the whole genome (Fig.2; Table4).

Sequences divergence rate of repeats annotated by RepeatMasker in wild yak. The x-axis represents the sequence divergence rate of repeats. The y-axis represents the percentage of repeat sequences in the genome.

Full size image
Full size table

Gene prediction and annotation

We employed a combination of hom*ology-based and de novo prediction methods to identify protein-coding genes. For hom*ology-based prediction, protein sequences of seven species (Bos taurus, Equus caballus, hom*o sapiens, Ovis aries, Sus scrofa, Bison bonasus, Bos grunniens) downloaded from Ensembl18 and GigaDB19,20 were aligned to the wild yak genome using TBLASTN21. Then GeneWise v2.4.122 software was applied to search for accurately spliced alignments based on the filtered hom*ologous genome sequences. For de novo prediction, we used Augustus23, Geneid24, GeneMark, GlimmerHMM25 and SNAP26 to predict genes with parameters trained on wild yak and human repeat-masked genomes. EVidenceModeler software (EVM, v1.1.1)27 was employed to generate a consensus gene set by integrating the genes predicted by the hom*ology and de novo approaches. Low-quality genes of short length (proteins with fewer than 30 amino acids) and/or exhibiting premature termination were removed to produce the final gene set, which is composed of 22,910 genes (Fig.3; Table5).

Comparison of structural characteristics of the wild yak genes with those of other mammals. (a) mRNA length, (b) CDS length, (c) Exon length, (d) Intron length, (e) Exon number per gene of wild yak, domestic yak, Bos taurus (UMD3.1), Ovis aries (Oar v3.1) and Bison bonasus (version 1.0). The x-axis represents length or number and the y-axis represents the density of genes.

Full size image
Full size table

Putative biological functions of these predicted high-quality genes were assigned by searching against five publicly available databases: TrEMBL, Swiss-Prot28, InterPro29, Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG)30. Approximately 90.18% of these genes were functionally annotated with at least one of these databases, with 90.05, 88.15, 83.51, 64.20 and 53.01% scoring positive hits in TrEMBL, SwissProt, InterPro, GO and KEGG, respectively (Table6).

Full size table

Data Records

The whole genome sequencing data were submitted to the NCBI Sequence Read Archive (SRA) database with accession number SRP194583 and Bioproject accession PRJNA53139831. The assembled draft genome of wild yak has been deposited at GenBank under the accession number of VBQZ0000000032. The annotation results of repeated sequences, gene structure and functional prediction were deposited in the Figshare database33.

Technical Validation

Quality assessment of the genome assembly

The assembly presented here is the first wild yak genome version. The contig N50 and scaffold N50 sizes were 63.2 kb and 16.3 Mb respectively, with the longest scaffold 75,900,441 bp. There are 258 scaffolds more than 1 Mb long, with a total length of 2,486,540,864 bp, representing 87.83% of the wild yak genome. By aligning the reads of short insert libraries to the wild yak assembly, we found more than 93.9% of the genome had >20-fold coverage, indicating high accuracy at the nucleotide level. BUSCO analysis carried out to assess the completeness of our assembly resulted in a BUSCO score of 96.8% (complete = 93.8%, single = 92.4%, duplicated = 1.4%, fragmented = 3.0%, missed = 3.2%, genes = 4,104). These results are comparable with those for the published European bison (wisent)34 and domestic yak4 genomes, suggesting our assembly has high quality and is quite complete.

Gene prediction and annotation validation

Gene models in the wild yak assembly were predicted using a combination of hom*ology-based and ab initio gene approaches. Then EVM software was employed to integrate the gene prediction results to produce a consensus gene set. To enhance the quality of the gene prediction, we removed low-quality genes of short length (proteins with fewer than 30 amino acids) and/or exhibiting premature termination. The final gene set consisted of 22,910 genes, and the distributions of gene length, CDS length, exon length, intron length and exon number were similar to those of other mammals (Fig.3). BUSCO analysis was also performed to assess the completeness of these predicted genes, resulting in a BUSCO value of 97.7% (complete = 94.8%, single = 93.1%, duplicated = 1.7%, fragmented = 2.9%, missed = 2.3%, genes = 4,104) (Table6). In addition, functional annotation of these predicted genes indicated that 90.18% of them could be assigned to at least one functional term (Table5). These results clearly indicate that the annotated gene set of the wild yak genome is quite complete.

Code availability

The software versions, settings and parameters used are described below.

(1) GCE, version 1.0.0, parameters used: kmer_freq_hash -k 21 -l reads.list -t 24 -i 5000000 -o 0 -p wild_yak & > kmer_freq.log; gce -f wild_yak.freq.stat -c 46 -g 155866493014 -m 1 -D 8 -b 1 > wild_yak.Table2> wild_yak.log.

(2) Lighter, version 1.1.1, parameters used: -k 17 3000000000 -trim -t 20.

(3) Platanus, version 1.2.4, parameters used: platanus assemble -o wild_yak -f <insert size 280 bp pair-end reads> <insert size 500 bp pair-end reads> <insert size 800 bp pair-end reads> -t 30 -m 500 -tmp temp; platanus scaffold -o wild_yak -c wild_yak_contig.fa -b wild_yak_contigBubble.fa -IP1 < insert size 280 bp pair-end reads> -a1 280 -IP2 <insert size 500 bp pair-end reads> -a2 500 -IP3 < insert size 800 bp pair-end reads > -a3 800 -OP4 <insert size 2 k pair-end reads> -a4 2000 -OP5 <insert size 5 k pair-end reads> -a5 5000 -OP6 <insert size 10 k pair-end reads> -a6 10000; platanus gap_close -o wild_yak -c wild_yak_scaffold.fa -IP1 <insert size 280 bp pair-end reads> -IP2 <insert size 500 bp pair-end reads> -IP3 <insert size 800 bp pair-end reads> -OP4 < insert size 2 k pair-end reads > -OP5 <insert size 5 k pair-end reads> -OP6 <insert size 10 k pair-end reads>.

(4) Gap Closer, version 1.12, parameters used: -l 150 -t 30, in configFile: max_rd_len = 100; Paired-end libs: reverse_seq = 0, asm_flags = 3; Mate-pair libs: reverse_seq = 11, asm_flags = 2.

(5) BUSCO, version 3: mammal default parameters, mammalia_odb9.

(6) BWA, version 0.7.15-r1140: default parameters.

(7) SAMtools, version 1.3; default parameters.

(8) RepeatMasker, version 4.0.7 (with RepBase library release-20170127).

(9) RepeatModeler, RepeatModeler-open-1.0.11.

(10) TRF, version 4.09, parameters used: trf wild_yak.gapclose.fa 2 7 7 80 10 50 500 -d -h.

(11) TBLASTN, version 2.5.0, parameter used: –e 1E-5.

(12) GeneWise, version 2.4.1, parameters used: -tfor/-trev (-rfor for genes on forward strand and -trev for reverse strand) -gff.

(13) Augustus, version 3.2.3, parameter used: -species = human.

(14) Geneid, version 1.0, parameters used: -3 -P.

(15) Genemark, version 3.9, parameter used: -f gff3.

(16) Snap, version 2006-07-28, parameter used: –gff.

(17) GlimmerHMM, version 3.0.4, default parameters.

(18) EVM, version 1.1.1, default parameters.

(19) InterProScan, version 5.25-64.0, parameters: -f tsv -iprlookup -goterms -pa -t p.

References

  1. Wiener, G., Han, J. & Long, R. The Yak 2nd edn. (Regional Office for Asia and the Pacific, Food and Agriculture Organization of the United Nations, Bangkok, 2003).

  2. Qiu, Q. et al. Yak whole-genome resequencing reveals domestication signatures and prehistoric population expansions. Nat. Commun. 6, 10283 (2015).

    Article ADS CAS Google Scholar

  3. Schaller, G. B. & Liu, W. Distribution, status, and conservation of wild yak Bos grunniens. Biol. Conserv. 76, 1–8 (1996).

    Article Google Scholar

  4. Qiu, Q. et al. The yak genome and adaptation to life at high altitude. Nat. Genet. 44, 946–949 (2012).

    Article CAS Google Scholar

  5. Jialin, B., Mingqiang, W., Zhonglin, L. & Chesworth, J. M. Meat production from crossbred and domestic yaks in China. Anim. Sci. 66, 465–469 (1998).

    Article Google Scholar

  6. Jialin, B., Mingqiang, W., Zhonglin, L. & Chesworth, J. M. The milking performance of dual-purpose crossbred yaks. Anim. Sci. 66, 471–473 (1998).

    Article Google Scholar

  7. Liu, B. et al. Estimation of genomic characteristics by analyzing k-mer frequency in de novo genome projects. Quant. Biol. (2013).

  8. Kajitani, R. et al. Efficient de novo assembly of highly heterozygous genomes from whole-genome shotgun short reads. Genome Res. 24, 1384–1395 (2014).

    Article CAS Google Scholar

  9. Luo, R. et al. SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler. GigaScience 1, 18 (2012).

    Article Google Scholar

  10. Song, L., Florea, L. & Langmead, B. Lighter: fast and memory-efficient sequencing error correction without counting. Genome Biol. 15, 1 (2014).

    Article Google Scholar

  11. Simao, F. A., Waterhouse, R. M., Ioannidis, P., Kriventseva, E. V. & Zdobnov, E. M. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31, 3210–3212 (2015).

    Article CAS Google Scholar

  12. Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754–1760 (2009).

    Article CAS Google Scholar

  13. Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).

    Article Google Scholar

  14. Yang, Y. et al. Draft genome of the Marco Polo Sheep (Ovis ammon polii). Gigascience 6, 1–7 (2017).

    Article CAS Google Scholar

  15. Tarailo-Graovac, M. & Chen, N. Using RepeatMasker to identify repetitive elements in genomic sequences. Curr. Protoc. in Bioinformatics Chapter 4, 4.10.11–14.10.14 (2009).

    Google Scholar

  16. Jurka, J. et al. Repbase Update, a database of eukaryotic repetitive elements. Cytogenet Genome Res. 110, 462–467 (2005).

    Article CAS Google Scholar

  17. Benson, G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 27, 573–580 (1999).

    Article CAS Google Scholar

  18. Yates, A. et al. Ensembl 2016. Nucleic Acids Res. 44, D710–716, https://doi.org/10.1093/nar/gkv1157 (2016).

    Article CAS PubMed Google Scholar

  19. Qiu, Q. et al. Genomic data from the domestic yak (Bos grunniens). GigaScience Database. https://doi.org/10.5524/100071 (2013).

  20. Wang, K. et al. Draft genome of European bison (wisent), Bison bonasus. GigaScience Database. https://doi.org/10.5524/100254 (2017).

  21. Camacho, C. et al. BLAST+: architecture and applications. BMC Bioinformatics 10, 421 (2009).

    Article Google Scholar

  22. Birney, E., Clamp, M. & Durbin, R. GeneWise and Genomewise. Genome Res. 14, 988–995 (2004).

    Article CAS Google Scholar

  23. Stanke, M. et al. AUGUSTUS: ab initio prediction of alternative transcripts. Nucleic Acids Res. 34, W435–W439 (2006).

    Article CAS Google Scholar

  24. Blanco, E., Parra, G. & Guigo, R. Using geneid to Identify Genes. Curr. Protoc. in Bioinform. 4.3.1–4.3.28 (2007).

  25. Majoros, W. H., Pertea, M. & Salzberg, S. L. TigrScan and GlimmerHMM: two open source ab initio eukaryotic gene-finders. Bioinformatics 20, 2878–2879 (2004).

    Article CAS Google Scholar

  26. Korf, I. Gene finding in novel genomes. BMC Bioinformatics 5, 59 (2004).

    Article Google Scholar

  27. Haas, B. J. et al. Automated eukaryotic gene structure annotation using EVidenceModeler and the Program to Assemble Spliced Alignments. Genome Biol. 9, R7 (2008).

    Article Google Scholar

  28. Bairoch, A. & Apweiler, R. The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000. Nucleic Acids Res. 28, 45–48 (2000).

    Article CAS Google Scholar

  29. Jones, P. et al. InterProScan 5: genome-scale protein function classification. Bioinformatics 30, 1236–1240 (2014).

    Article CAS Google Scholar

  30. Kanehisa, M. & Goto, S. KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res. 28, 27–30 (2000).

    Article CAS Google Scholar

  31. NCBI Sequence Read Archive. http://identifiers.org/ncbi/insdc.sra:SRP194583 (2019).

  32. Liu, Y. Bos mutus breed Datong Yak isolate WY2019, whole genome shotgun sequencing project. GenBank. http://identifiers.org/ncbi/insdc:VBQZ00000000 (2019).

  33. Liu, Y. The sequence and de novo assembly of the wild yak genome. figshare. https://doi.org/10.6084/m9.figshare.8031800.v2 (2019).

  34. Wang, K. et al. The genome sequence of the wisent (Bison bonasus). Gigascience 6, 1–5 (2017).

    Article Google Scholar

Download references

Acknowledgements

This study was supported by the National Natural Science Foundation of China (grant nos. 31661143020, 41620104007, 31801089), the National Program for Support of Top-notch Young Professionals, and the Fok Ying Tung Education Foundation (151105).

Author information

Authors and Affiliations

  1. State Key Laboratory of Grassland Agro-Ecosystems, School of Life Sciences, Lanzhou University, Lanzhou, China

    Yanbin Liu,Jiayu Luo,Jiajia Dou,Biyao Yan,Qingmiao Ren,Bolin Tang&Qiang Qiu

  2. Research Center for Ecology and Environmental Sciences, Northwestern Polytechnical University, Xi’an, China

    Kun Wang

Authors

  1. Yanbin Liu

    View author publications

    You can also search for this author in PubMedGoogle Scholar

  2. Jiayu Luo

    View author publications

    You can also search for this author in PubMedGoogle Scholar

  3. Jiajia Dou

    View author publications

    You can also search for this author in PubMedGoogle Scholar

  4. Biyao Yan

    View author publications

    You can also search for this author in PubMedGoogle Scholar

  5. Qingmiao Ren

    View author publications

    You can also search for this author in PubMedGoogle Scholar

  6. Bolin Tang

    View author publications

    You can also search for this author in PubMedGoogle Scholar

  7. Kun Wang

    View author publications

    You can also search for this author in PubMedGoogle Scholar

  8. Qiang Qiu

    View author publications

    You can also search for this author in PubMedGoogle Scholar

Contributions

Q. Qiu and K. Wang designed and supervised the project; B. Yan and Q. Ren prepared the samples; Y. Liu, J. Luo, J. Dou and B. Tang analyzed the data; Y. Liu wrote the manuscript with other authors’ help and Q. Qiu and K. Wang revised the manuscript. All authors read and approved the final manuscript.

Corresponding authors

Correspondence to Kun Wang or Qiang Qiu.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

The Creative Commons Public Domain Dedication waiver http://creativecommons.org/publicdomain/zero/1.0/ applies to the metadata files associated with this article.

Reprints and permissions

About this article

The sequence and de novo assembly of the wild yak genome (4)

Cite this article

Liu, Y., Luo, J., Dou, J. et al. The sequence and de novo assembly of the wild yak genome. Sci Data 7, 66 (2020). https://doi.org/10.1038/s41597-020-0400-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1038/s41597-020-0400-3

The sequence and de novo assembly of the wild yak genome (2024)

FAQs

What is de novo whole genome sequencing strategies? ›

De novo sequencing refers to sequencing a novel genome where there is no reference sequence available for alignment. Sequence reads are assembled as contigs, and the coverage quality of de novo sequence data depends on the size and continuity of the contigs (ie, the number of gaps in the data).

What is the assembly of data from genome sequencing? ›

Genome assembly refers to the process of taking a large number of short DNA sequences and putting them back together to create a representation of the original chromosomes from which the DNA originated.

What is the difference between de novo assembly and reference genome? ›

De novo assembly refers to the genome assembly of a novel genome from scratch without the aid of reference genomic data. A reference genome or a reference assembly is a digital nucleic acid sequence database, acting as a representative example of a species' set of genes.

How to de novo sequence? ›

To begin de novo sequencing, scientists create many copies of the DNA of interest and chop up, or fragment, the large genome into smaller pieces. The smaller pieces are sequenced using short-read sequencing, long-read sequencing, or a combination of both techniques.

What is the purpose of de novo sequencing? ›

The initial generation of the primary genetic sequence of a particular organism is called de novo sequencing. A detailed genetic analysis of any organism is possible only after de novo sequencing has been performed.

What does a de novo gene do? ›

De novo gene birth is the process by which new genes evolve from DNA sequences that were ancestrally non-genic. De novo genes represent a subset of novel genes, and may be protein-coding or instead act as RNA genes [1].

What is an example of de novo peptide sequencing? ›

The main principle of de novo sequencing is to use the mass difference between two fragment ions to calculate the mass of an amino acid residue on the peptide backbone. For example, the mass difference between the y7 and y6 ions in the following figure is equal to 101, which is the mass of residue T.

What is a good genome assembly? ›

Genome assembly quality can be measured by the BUSCO (Benchmark Universal Single Copy Straight hom*ologue) score, which looks for the presence of highly conserved genes in the assembly. The goal is to identify the highest percentage of genes in the assembly, and a BUSCO completeness score above 95% is considered good.

How does sequence assembly work? ›

In order to achieve the correct DNA sequence assembly, it is necessary to read multiple fragments of sequences and then link them back together in the correct order. This involves overlapping the ends of the fragments because the current DNA sequencing technology is unable to read the entire genome sequence at once.

What can you do with whole-genome sequencing data? ›

In research, whole-genome sequencing can be used in a Genome-Wide Association Study (GWAS) – a project aiming to determine the genetic variant or variants associated with a disease or some other phenotype.

How to do denovo genome assembly? ›

To assemble a genome, computer programs typically use data consisting of single and paired reads. Single reads are simply the short sequenced fragments themselves; they can be joined up through overlapping regions into a continuous sequence known as a 'contig'.

How do you identify de novo genes? ›

With the high-quality base-level whole genome alignments, we can identify de novo genes with the support of alignments against non-coding sequences in outgroup species. The aligned non-coding sequences could also facilitate the discovery of possible origination events of de novo genes.

What are the algorithms for de novo assembly? ›

Types of de novo assemblers

There are two types of algorithms that are commonly utilized by these assemblers: greedy, which aim for local optima, and graph method algorithms, which aim for global optima.

How are genomes assembled? ›

To assemble a genome, computer programs typically use data consisting of single and paired reads. Single reads are simply the short sequenced fragments themselves; they can be joined up through overlapping regions into a continuous sequence known as a 'contig'.

How are de novo genes created? ›

De novo gene birth is the process by which new genes evolve from non-coding DNA. De novo genes represent a subset of novel genes, and may be protein-coding or instead act as RNA genes.

How to assemble DNA sequences? ›

In order to achieve the correct DNA sequence assembly, it is necessary to read multiple fragments of sequences and then link them back together in the correct order. This involves overlapping the ends of the fragments because the current DNA sequencing technology is unable to read the entire genome sequence at once.

How do you assemble a DNA molecule? ›

In DNA, these four chemicals always link together to form pairs: A pairs with T and C pairs with G. In this very specific way, the two complementary strands link together to form DNA: a long molecule that looks a little like a rope ladder, only about 200,000,000 times smaller and twisted.

References

Top Articles
The 6 Best Custom Neon Sign Companies in 2024 – Artchive
GREATER FORT LAUDERDALE (BROWARD COUNTY) FACT SHEET
2016 Hyundai Sonata Refrigerant Capacity
Corinne Massiah Bikini
Craigslist Kentucky Cars And Trucks - By Owner
Grizzly Expiration Date 2023
Feet.girl01
Miramar Water Utility
Uscis Fort Myers 3850 Colonial Blvd
Married At First Sight Novel Serenity And Zachary Chapter 950
Okc Farm And Garden Craigslist
Hydro Quebec Power Outage Map
Precision Garage Door Long Island
Sonic Fan Games Hq
Mid-Autumn Festival 2024: The Best Lantern Displays and Carnivals in Hong Kong 
Summoners War Update Notes
Does Cvs Sell Ulta Gift Cards
Big Lots $99 Fireplace
Onderdelen | Onderdelen en services
Skip The Games Lawton Oklahoma
Wwba Baseball
Chi Trib Weather
Watch Psychological Movies Online for FREE | 123Movies
Used Safari Condo Alto R1723 For Sale
Omaha Steaks Molten Lava Cake Instructions
Jasper Jones County Trade
Charlotte North Carolina Craigslist Pets
Wells Fargo Holiday Hours
How 'Tuesday' Brings Death to Life With Heart, Humor, and a Giant Bird
Goodwoods British Market Friendswood
Prisoners Metacritic
Devil May Cry 3: Dante's Awakening walkthrough/M16
Was Lil Mosey In Ride Along
Official Klj
Is Jamie Kagol Married
Jbz Inlog
Gofish Dating
Bible Gateway Lookup
Promiseb Discontinued
Jason Brewer Leaving Fox 25
424-385-0597 phone is mostly reported for Text Message!
Hyb Urban Dictionary
Pulp Fiction 123Movies
Pinellas Fire Active Calls
Hexanaut.io – Jouez en ligne sur Coolmath Games
This Meteorologist Was Wardrobe Shamed, So She Fought Back | Star 101.3 | Marcus & Corey
Dermpathdiagnostics Com Pay Invoice
Portmanteau Structure Built With Cans
Circle K Wikipedia
Six Broadway Wiki
Lanipopvip
Pollen Count Butler Pa
Latest Posts
Article information

Author: Jerrold Considine

Last Updated:

Views: 5954

Rating: 4.8 / 5 (78 voted)

Reviews: 85% of readers found this page helpful

Author information

Name: Jerrold Considine

Birthday: 1993-11-03

Address: Suite 447 3463 Marybelle Circles, New Marlin, AL 20765

Phone: +5816749283868

Job: Sales Executive

Hobby: Air sports, Sand art, Electronics, LARPing, Baseball, Book restoration, Puzzles

Introduction: My name is Jerrold Considine, I am a combative, cheerful, encouraging, happy, enthusiastic, funny, kind person who loves writing and wants to share my knowledge and understanding with you.