findMySequence: a neural-network-based approach for identification of unknown proteins in X-ray crystallography and cryo-EM

Chojnowski, Grzegorz; Simpkin, Adam J.; Leonardo, Diego A.; Seifert-Davila, Wolfram; Vivas-Ruiz, Dan E.; Keegan, Ronan M.; Rigden, Daniel J.

Abstract

Although experimental protein-structure determination usually targets known proteins, chains of unknown sequence are often encountered. They can be purified from natural sources, appear as an unexpected fragment of a well characterized protein or appear as a contaminant. Regardless of the source of the problem, the unknown protein always requires characterization. Here, an automated pipeline is presented for the identification of protein sequences from cryo-EM reconstructions and crystallographic data. The method's application to characterize the crystal structure of an unknown protein purified from a snake venom is presented. It is also shown that the approach can be successfully applied to the identification of protein sequences and validation of sequence assignments in cryo-EM protein structures.

Más información

Título según WOS: ID WOS:000742078000014 Not found in local WOS DB
Título de la Revista: IUCRJ
Volumen: 9
Editorial: INT UNION CRYSTALLOGRAPHY
Fecha de publicación: 2022
Página de inicio: 86
Página final: +
DOI:

10.1107/S2052252521011088

Notas: ISI