KD4v: comprehensible knowledge discovery system for missense variant

Share Embed


Descrição do Produto

http://decrypthon.igbmc.fr/kd4v KD4v: Comprehensible Knowledge Discovery System For Missense Variants Tien-Dao Luu, Vincent Walter,Hoan Nguyen and Olivier Poch 1 Institute of Genetics and Molecular and Cellular Biology (IGBMC), Illkirch,France

[email protected]

Introduction 

A major challenge in the post-genomic era is a better understanding of how human genetic alterations involved in disease affect the gene products.



The KD4v server allows to characterize and predict the phenotypic effects (deleterious/neutral) of missense variants.



16 predicates annotated by MSV3d database: conservation, physico-chemical, functional and 3D structure



The server provides a set of rules learned by Induction Logic Programming.



These rules are interpretable by non-expert humans and are used to accurately predict the deleterious/neutral status of an unknown mutation.

Method & Implementation Generating knowledge: Applying learning methods Comprehensive results: Inductive Logic Programming (ILP) ILP: machine learning +Logic Programming Schema: positive examples + negative examples + background knowledge => hypothesis

Annotation service

Missense Variant

Interpretable rules

Prediction services

Some induced rules obtained by ILP new mutation

physicochemical

Selection of structural mutations

Conservation Localisation Training

Accessibility

Prolog code:

.

Stability Aleph/prolog Contacts

Selected Rules

Prediction service Web, Api SOAP

biologistes

human interpretable rules (if … then …)

deleterious(A) :conservation_class(A, sub_family_conservation), secondary_struc(A, no_helix_no_sheet), gain_contact(A, B), B>=1, stability(A, decrease).

Transform ILP rules into English sentences:

+ neutral or deleterious + decision rules

Dataset-Uniprot/Polyphen-2: 8000 variant swith 3D structure Cross Validation: SIFT PP2

TP 398 576

FP 38 111

FN 260 77

TN 260 184

POS 658 658

NEG 298 298

Pre 0,91 0,84

Recall 0,60 0,88

Acc 0,69 0,80

F-m (1) 0,73 0,86

KD4v

487

94

171

204

658

298

0,84

0,74

0,72

0,79

This rule states that a mutation A is deleterious if: • The mutated residue belongs into the “subfamily conservation class”. • The residue is found in neither an α-helix, nor a β-sheet. • The number of contacts gained after point mutation is larger than or equal to 1. • The stability of the protein after point mutation is decreased.

Cancer-associated gene: MSH2 variant swith 3D structure TP

FP

FN

TN

POS

NEG

Pre

Recall

Acc

F-m (1)

SIFT

33

2

39

10

72

12

0,94

0,45

0,51

0,62

PP2

47

4

25

8

72

12

0,92

0,65

0,65

0,76

KD4v

46

3

26

9

72

12

0,93

0,64

0,65

0,76

The prediction performance of KD4v is comparable with other methods

These ILP rules can be used, for example, to uncover the relationships between the deleterious effect of a mutation and the multi-class conservation pattern or the type of the physico-chemical alterations (e.g., size, charge and hydrophobicity) introduced by the substitution

Future work: •Prediction with 3D structure: adding structural surface topology descriptions of the proteins. •Prediction without 3D structure •SVM+ILP

References [1] Luu, T.-D., Rusu, A.-M., Walter, V., et al. (2012b). MSV3d: database of human MisSense Variants mapped to 3D protein structure. Database (Oxford) 2012, bas018. [2] Luu, T.-D., Rusu, A., Walter, et al. (2012a). KD4v: Comprehensible Knowledge Discovery System for Missense Variant. Nucleic Acids Res. 40, W71–75. [3] Friedrich, A., et al. (2010). SM2PH-db: an interactive system for the integrated analysis of phenotypic consequences, Human Mutation.

Acknowledgements: This work was funded by the Association Française contre les Myopathies (AFM), the Vietnam Ministry of Education and Training, the Institute National de la Santé et de la Recherche Médicale, the Centre National de la Recherche Scientifique (CNRS), and the Université de Strasbourg

Lihat lebih banyak...

Comentários

Copyright © 2017 DADOSPDF Inc.