Machine learning accelerates classification of Amyloids

Amyloids are proteins capable of forming fibrils and many of these proteins underlie serious diseases, specifically including Alzheimer’s disease.  A few hundreds of such peptides have been experimentally found; however experimental testing of all possible aminoacid combinations is currently infeasible.

Instead, they are predicted by computational methods. 3D profile is a physicochemical-based method that has generated the largest and best known dataset – ZipperDB. However, this approach is still computationally expensive and therefore many amyloids remain unknown.

In this paper the authors show that dataset generation can be accelerated using machine learning.

“The computational time was reduced from 18-20 CPU-hours (full 3D profile) to 0.5 CPU-hours (simplified 3D profile) to seconds (machine learning).”

Results are achieved here with the free and available WEKA machine learning toolkit which is a perfect tool for DIY experimentation.

The complete article is available as a PDF here:

2 Responses

  1. Beo says:

    I don’t quite get how does 3d profile works

    • Peter says:

      This is the source paper that describes the technique by Thompson et al:

      From the Zipper DB documentation:

      “Fibrillation propensities were computed using a structure-based algorithm originally described by Thompson et al., PNAS, 2006. This algorithm uses the crystal structure of the fibril-forming peptide NNQQNY from the sup35 prion protein of Saccharomyces cerevisiae which makes up the cross-beta spine of amyloid-like fibrils (Nelson, et al. Nature, 2005). Each six-residue peptide not containing a proline from a putative protein sequence is threaded onto the NNQQNY structure backbone, and the energetic fit is evaluated by using the RosettaDesign program (Kuhlman et al., PNAS, 2000). To avoid problems with their disulphide bonding abilities, cysteines were substituted to serines during modeling. Additionally, the quality of the steric zipper interface is compared in terms of shape complementarity and surface area to other amyloid-like peptides reported by Sawaya et al. Nature, 2007. Based on these experimental amyloid-like peptide structures an energy threshold of -23 kcal/mol was chosen. Segments with energies equal to or below this threshold are deemed to have high fibrillation propensity.”


Share Your Thoughts