4.6 Article

DPCfam: Unsupervised protein family classification by Density Peak Clustering of large sequence datasets

相关参考文献

注意:仅列出部分参考文献,下载原文获取全部文献信息。
Article Biochemistry & Molecular Biology

AlphaFold Protein Structure Database: massively expanding the structural coverage of protein-sequence space with high-accuracy models

Mihaly Varadi et al.

Summary: AlphaFold DB is an openly accessible database with high-accuracy protein-structure predictions, powered by DeepMind's AlphaFold v2.0. It provides programmatic access to a vast number of predicted structures and is expanding to cover more sequences.

NUCLEIC ACIDS RESEARCH (2022)

Article Biochemistry & Molecular Biology

UniProt: the universal protein knowledgebase in 2021

Alex Bateman et al.

Summary: The UniProt Knowledgebase aims to provide users with a comprehensive, high-quality set of protein sequences annotated with functional information. Updates over the past two years have increased the number of sequences to approximately 190 million, with new methods to assess proteome completeness and quality. UniProtKB has responded to the COVID-19 pandemic by expertly curating relevant entries and making them rapidly available through a dedicated portal.

NUCLEIC ACIDS RESEARCH (2021)

Article Biochemical Research Methods

Density Peak clustering of protein sequences associated to a Pfam clan reveals clear similarities and interesting differences with respect to manual family annotation

Elena Tea Russo et al.

Summary: This study introduces a procedure based on Density Peak Clustering to automatically identify putative protein families. The results show that the algorithm successfully identifies protein families and family architectures in an unsupervised manner and highlights potential applications related to automatic protein classification. Comparison with Pfam classification demonstrates significant overlap and interesting differences, indicating the potential of the new algorithm in protein classification applications. Further experiments on large and diverse sequence datasets are needed to test this hypothesis.

BMC BIOINFORMATICS (2021)

Article Biochemical Research Methods

Sensitive protein alignments at tree-of-life scale using DIAMOND

Benjamin Buchfink et al.

Summary: We are at the beginning of a genomic revolution where all known species are planned to be sequenced. The improved version of DIAMOND allows for quick tree-of-life scale protein alignments.

NATURE METHODS (2021)

Article Multidisciplinary Sciences

Highly accurate protein structure prediction with AlphaFold

John Jumper et al.

Summary: Proteins are essential for life, and accurate prediction of their structures is a crucial research problem. Current experimental methods are time-consuming, highlighting the need for accurate computational approaches to address the gap in structural coverage. Despite recent progress, existing methods fall short of atomic accuracy in protein structure prediction.

NATURE (2021)

Article Biochemistry & Molecular Biology

Pfam: The protein families database in 2021

Jaina Mistry et al.

Summary: The Pfam database has recently added a large number of protein families and domains, made revisions for COVID-19 research, and introduced Pfam-B as a supplement. These updates and improvements can help researchers classify protein sequences more effectively and conduct related studies.

NUCLEIC ACIDS RESEARCH (2021)

Article Biochemical Research Methods

Disentangling the complexity of low complexity proteins

Pablo Mier et al.

BRIEFINGS IN BIOINFORMATICS (2020)

Article Biochemistry & Molecular Biology

CDD/SPARCLE: the conserved domain database in 2020

Shennan Lu et al.

NUCLEIC ACIDS RESEARCH (2020)

Article Biochemical Research Methods

DeepCoil-a fast and accurate prediction of coiled-coil domains in protein sequences

Jan Ludwiczak et al.

BIOINFORMATICS (2019)

Article Biochemistry & Molecular Biology

InterPro in 2019: improving coverage, classification and access to protein sequence annotations

Alex L. Mitchell et al.

NUCLEIC ACIDS RESEARCH (2019)

Article Biochemistry & Molecular Biology

20 years of the SMART protein domain annotation resource

Ivica Letunic et al.

NUCLEIC ACIDS RESEARCH (2018)

Article Biochemistry & Molecular Biology

IUPred2A: context-dependent prediction of protein disorder as a function of redox state and protein binding

Balint Meszaros et al.

NUCLEIC ACIDS RESEARCH (2018)

Letter Biotechnology & Applied Microbiology

MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets

Martin Steinegger et al.

NATURE BIOTECHNOLOGY (2017)

Article Biochemistry & Molecular Biology

Manual classification strategies in the ECOD database

Hua Cheng et al.

PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS (2015)

Article Biochemistry & Molecular Biology

The Structure-Function Linkage Database

Eyal Akiva et al.

NUCLEIC ACIDS RESEARCH (2014)

Article Multidisciplinary Sciences

Clustering by fast search and find of density peaks

Alex Rodriguez et al.

SCIENCE (2014)

Article Biochemistry & Molecular Biology

Challenges in homology search: HMMER3 and convergent evolution of coiled-coil regions

Jaina Mistry et al.

NUCLEIC ACIDS RESEARCH (2013)

Article Mathematical & Computational Biology

The challenge of increasing Pfam coverage of the human proteome

Jaina Mistry et al.

DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION (2013)

Article Biochemistry & Molecular Biology

Adaptive seeds tame genomic sequence comparison

Szymon M. Kielbasa et al.

GENOME RESEARCH (2011)

Article Biochemistry & Molecular Biology

Close encounters of the third kind: disordered domains and the interactions of proteins

Peter Tompa et al.

BIOESSAYS (2009)

Article Biochemical Research Methods

BLAST plus : architecture and applications

Christiam Camacho et al.

BMC BIOINFORMATICS (2009)

Article Biochemical Research Methods

Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences

Weizhong Li et al.

BIOINFORMATICS (2006)

Article Biochemical Research Methods

EVEREST: automatic identification and classification of protein domains in all protein sequences

Elon Portugaly et al.

BMC BIOINFORMATICS (2006)

Review Cell Biology

Intrinsically unstructured proteins and their functions

HJ Dyson et al.

NATURE REVIEWS MOLECULAR CELL BIOLOGY (2005)

Article Biochemistry & Molecular Biology

MUSCLE: multiple sequence alignment with high accuracy and high throughput

RC Edgar

NUCLEIC ACIDS RESEARCH (2004)

Article Biochemistry & Molecular Biology

A combined transmembrane topology and signal peptide prediction method

L Käll et al.

JOURNAL OF MOLECULAR BIOLOGY (2004)

Article Biochemistry & Molecular Biology

Exhaustive enumeration of protein domain families

A Heger et al.

JOURNAL OF MOLECULAR BIOLOGY (2003)

Article Biochemistry & Molecular Biology

An efficient algorithm for large-scale detection of protein families

AJ Enright et al.

NUCLEIC ACIDS RESEARCH (2002)

Article Biochemistry & Molecular Biology

Protein family and fold occurrence in genomes: Power-law behaviour and evolutionary model

J Qian et al.

JOURNAL OF MOLECULAR BIOLOGY (2001)

Article Biochemistry & Molecular Biology

The COG database: a tool for genome-scale analysis of protein functions and evolution

RL Tatusov et al.

NUCLEIC ACIDS RESEARCH (2000)