Title

Distance Matrices as Protein Representations

Abstract

Abstract

Representing protein sequences is a crucial problem in the field of bioinformatics since any data-driven model's performance is limited by the information contained in its input features. A protein's biological function is dictated by its structure and knowing a protein's structure can potentially help predict its interactions with drug candidates or predict its Gene Ontology(GO) term. Yet, off-the-shelf protein representations do not contain such information since only a small fraction of the billions of known protein sequences have experimentally determined structures, as the cost of running such experiments is quite high. A newly introduced neural network-based structure prediction model, AlphaFold, claims to be able to predict protein structures with high accuracy. In this study, two-dimensional distance matrices generated from AlphaFold structure predictions are used as input features while modeling two different bioinformatics problems; drug-target interaction(DTI) prediction and Gene Ontology term prediction. For the DTI prediction problem, a state-of-the-art model which already uses two-dimensional protein features, is employed as a baseline. Then, the effect of distance matrices is observed through ablation studies. Moreover, the same model is adapted in order to tackle the GO prediction problem and its success is compared with off-the-shelf protein representations.

Supervisor(s)

Supervisor(s)

MEHMET DINC

Date and Location

Date and Location

2022-09-02 11:00:00

Category

Category

MSc_Thesis