4.7 Article

Alignment-free estimation of sequence conservation for identifying functional sites using protein sequence embeddings

相关参考文献

注意:仅列出部分参考文献,下载原文获取全部文献信息。
Article Biochemistry & Molecular Biology

AlphaFold Protein Structure Database: massively expanding the structural coverage of protein-sequence space with high-accuracy models

Mihaly Varadi et al.

Summary: AlphaFold DB is an openly accessible database with high-accuracy protein-structure predictions, powered by DeepMind's AlphaFold v2.0. It provides programmatic access to a vast number of predicted structures and is expanding to cover more sequences.

NUCLEIC ACIDS RESEARCH (2022)

Article Computer Science, Hardware & Architecture

Language Models: Past, Present, and Future

Hang Li

Summary: Pre-trained language models have shown remarkable advantages in improving NLP task accuracy and serving as universal language processing tools.

COMMUNICATIONS OF THE ACM (2022)

Article Biochemistry & Molecular Biology

UniProt: the universal protein knowledgebase in 2021

Alex Bateman et al.

Summary: The UniProt Knowledgebase aims to provide users with a comprehensive, high-quality set of protein sequences annotated with functional information. Updates over the past two years have increased the number of sequences to approximately 190 million, with new methods to assess proteome completeness and quality. UniProtKB has responded to the COVID-19 pandemic by expertly curating relevant entries and making them rapidly available through a dedicated portal.

NUCLEIC ACIDS RESEARCH (2021)

Article Multidisciplinary Sciences

Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences

Alexander Rives et al.

Summary: The deep contextual language model trained through unsupervised learning on protein sequences contains information about biological properties, has a multiscale structural organization, and can be used to improve predictions for protein mutational effects, secondary structure, and long-range contacts.

PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA (2021)

Article Multidisciplinary Sciences

Highly accurate protein structure prediction with AlphaFold

John Jumper et al.

Summary: Proteins are essential for life, and accurate prediction of their structures is a crucial research problem. Current experimental methods are time-consuming, highlighting the need for accurate computational approaches to address the gap in structural coverage. Despite recent progress, existing methods fall short of atomic accuracy in protein structure prediction.

NATURE (2021)

Review Biochemistry & Molecular Biology

Learning the protein language: Evolution, structure, and function

Tristan Bepler et al.

Summary: Language models are powerful machine-learning approaches for distilling information from protein sequence databases and capturing structural, functional, and evolutionary knowledge. The knowledge extracted by these models can improve protein function prediction and revolutionize protein biology research. Further developments are needed to incorporate strong biological priors into protein language models and increase their accessibility to the broader community.

CELL SYSTEMS (2021)

Article Biochemistry & Molecular Biology

The language of proteins: NLP, machine learning & protein sequences

Dan Ofer et al.

Summary: NLP methods have made significant progress in studying proteins, allowing for effective encoding and analysis of protein information. By transforming protein data into text format, a variety of NLP techniques can be applied to address tasks related to proteins.

COMPUTATIONAL AND STRUCTURAL BIOTECHNOLOGY JOURNAL (2021)

Article Biochemistry & Molecular Biology

Pfam: The protein families database in 2021

Jaina Mistry et al.

Summary: The Pfam database has recently added a large number of protein families and domains, made revisions for COVID-19 research, and introduced Pfam-B as a supplement. These updates and improvements can help researchers classify protein sequences more effectively and conduct related studies.

NUCLEIC ACIDS RESEARCH (2021)

Article Biochemistry & Molecular Biology

CDD/SPARCLE: the conserved domain database in 2020

Shennan Lu et al.

NUCLEIC ACIDS RESEARCH (2020)

Article Biochemical Research Methods

HH-suite3 for fast remote homology detection and deep protein annotation

Martin Steinegger et al.

BMC BIOINFORMATICS (2019)

Review Biotechnology & Applied Microbiology

Alignment-free sequence comparison: benefits, applications, and tools

Andrzej Zielezinski et al.

GENOME BIOLOGY (2017)

Article Biochemical Research Methods

Sequence similarity network reveals common ancestry of multidomain proteins

Nan Song et al.

PLOS COMPUTATIONAL BIOLOGY (2008)

Article Biochemical Research Methods

Predicting functionally important residues from sequence conservation

John A. Capra et al.

BIOINFORMATICS (2007)

Article Multidisciplinary Sciences

Regulation of PDGF signalling and vascular remodelling by peroxiredoxin II

MH Choi et al.

NATURE (2005)

Article Statistics & Probability

Regularization and variable selection via the elastic net

H Zou et al.

JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY (2005)

Article Biochemistry & Molecular Biology

WebLogo: A sequence logo generator

GE Crooks et al.

GENOME RESEARCH (2004)