UniProt has developed an automatic annotation system to enhance unreviewed TrEMBL entries in the UniProt Knowledgebase (UniProtKB) by enriching them with automatically predicted annotations. In release 2020_04 of August 2020, a new powerful automated system called ARBA replaced the previous SAAS (Statistical Automatic Annotation System) system. ARBA is a multiclass learning system trained on expertly annotated entries in UniProtKB/Swiss-Prot. ARBA uses rule mining techniques to generate concise annotation models with the highest representativeness and coverage for annotation, based on the properties of InterPro group membership and taxonomy.
ARBA currently generates around 23 thousand models, resulting in annotations for more than 85 million proteins including 35 million that lacked any previous annotation. Consequently, UniProtKB witnessed an increase in automatic annotation coverage from 35% to 50%. All ARBA rules can be accessed here and relevant rules are also tagged as evidence for annotations from UniProtKB entries.
No comments:
Post a Comment