Amyloids are proteins capable of forming fibrils and many of these proteins underlie serious diseases, specifically including Alzheimer’s disease. A few hundreds of such peptides have been experimentally found; however experimental testing of all possible aminoacid combinations is currently infeasible.
Instead, they are predicted by computational methods. 3D profile is a physicochemical-based method that has generated the largest and best known dataset – ZipperDB. However, this approach is still computationally expensive and therefore many amyloids remain unknown.
In this paper the authors show that dataset generation can be accelerated using machine learning.
“The computational time was reduced from 18-20 CPU-hours (full 3D profile) to 0.5 CPU-hours (simplified 3D profile) to seconds (machine learning).”
Results are achieved here with the free and available WEKA machine learning toolkit which is a perfect tool for DIY experimentation.
The complete article is available as a PDF here: