For research purposes, I focused on machine learning and tried to gain as much knowledge as possible on linear algebra and statistics to build up my fundametals. Eventually, I learned a lot of machine learning/data mining techniques like classification, clustering, feature selection and feature extraction methods. Some of them are SVM, Student's t-test, PCA, Kruskal-Wallis test, Mann-Whitney-Wilcoxon test, KNN, decistion trees, mRMR etc. As evident from my publications, I applied my knowledge on bioinformatics and worked specially on the impact of feature selection methods in this sector.
Paper Title: Metabolomic Biomarker Identification for Lung Cancer By Combining Multiple Statistical Approaches
Tahsin Masrur, Md. Al Mehedi Hasan, Md. Nazrul Islam Mondal
2019 International Conference on Electrical, Computer and Communication Engineering (ECCE), 7-9 February, 2019
Cox's Bazar, Bangladesh
Faculty of Electrical and Computer Engineering, Chittagong University of Engineering & Technology, Bangladesh
Metabolomic biomarkers are tools that can be used in early disease prediction and drug designing for diseases like lung cancer. Knowing the most differentially expressed metabolites creates a much higher probability of diagnosing lung cancer faster than normal, which can reduce the mortality rate. They are crucial during drug design too. Previously, various works have been done on discovering biomarkers for different diseases. However, it is still nowhere near sufficient since reducing the number of biomarkers and maintaining good classification accuracy are urgent issues in a sector where people's lives are at stake. Thus, to contribute more, in this paper, we have identified the influential metabolites in plasma and serum blood sample for lung cancer and then selected biomarkers from them. We first considered a parametric test (Student's t-test) and two non-parametric tests (Kruskal-Wallis and Mann-Whitney-Wilcoxon test) to identify the influential metabolites. We also differentiated the up-regulated and down-regulated metabolites using FC values and heatmap plot. We used SVM classifier to ascertain good accuracy with our set of influential metabolites and ROC Curve Analysis to rank the metabolites and choose biomarkers. Our analysis resulted in 28 influential (p-value < 0.05) metabolites from plasma sample and 13 influential (p-value < 0.05) metabolites from serum sample. Finally, 10 metabolites were chosen from each of the samples as respective biomarkers. All the files and codes used in our work are available at https://github.com/Zeronfinity/LungCancerBiomarkers.
Paper Title: Identification of Metabolomic Biomarker using Multiple Statistical Techniques and Recursive Feature Elimination
Tahsin Masrur, Md. Al Mehedi Hasan
International Conference on Computer, Communication, Chemical, Materials and Electronic Engineering (IC4ME2), 11-12 July, 2019
University of Rajshahi, Rajshahi, Bangladesh
Faculty of Engineering, University of Rajshahi, Bangladesh
Mortality rate of diseases like lung cancer can be decreased significantly by increasing the chance of early diagnosis. Identifying differentially expressed (DE) metabolites may contribute remarkably in this concern, and also in drug design. In the past, several kinds of approaches were attempted to discover biomarkers for diseases. Nonetheless, discovering compact-sized biomarkers while maintaining satisfactory classification performance is still a challenge. Therefore, for further contribution in this sector, we have declared biomarkers from our identified DE metabolites in plasma and serum blood sample of lung cancer. Student's t-test, Kruskal-Wallis and Mann-Whitney-Wilcoxon test were applied to distinguish the DE metabolites. Cluster heatmap plot and fold change values were used to differentiate between up and down-regulated metabolites. Finally, RFE method was used to order the metabolites and select biomarkers from them. To assess the performance with our DE metabolites or biomarkers, SVM classifier was utilized. We found 28 DE metabolites from plasma dataset and 13 from serum (p-value<0.05). In the end, 8 metabolites were selected from plasma sample and 5 were selected from serum sample as the metabolomic biomarkers. The relevant files and codes of our work can be found at https://github.com/Zeronfinity/LungCancerBiomarkerRFE.
Paper Title: Predicting N1- and N6-methyladenosine RNA Modifications using Hybrid Feature Selection Approach
Tasfin Jayed, Md. Al Mehedi Hasan, Tahsin Masrur (also presenter)
Dr. Fatema Rashid Best Paper Award
International Conference on Advances in Electrical Engineering (ICAEE), 26-28 September, 2019
Independent University, Dhaka, Bangladesh
Department of Electrical and Electronic Engineering, Independent University, Bangladesh (IUB)
RNA modification refers to the local structural changes or new chemical group additions in nucleotides. It has impact on some crucial biological activities and is also linked to several serious diseases, e.g. leukemia, breast cancer, zika virus and so on. Thats why the identification of RNA modifications attains great concern. N1-methyladenosine (m1A) and N6-methyladenosine (m6A) are two frequent modifications which occur at the adenosine site of RNA. So far, various methods have been developed to predict the modifications, e.g. iRNA-3typeA, RAM-ESVM etc. However, these methods can be improved further with the help of multiple feature selection approaches. In this paper, we have done extensive analysis on the effect of multiple feature selection methods and proposed a hybrid feature selection approach. This hybrid feature selection approach considers the common features that have been selected by Students t-test, Kruskal-Wallis test and minimum redundancy maximum relevance (mRMR) method. Applying this approach with 10-fold cross-validation and support vector machine classifier, we have obtained 99.37% and 91.02% accuracy for m1A and m6A (Homo sapiens), and 89.97% and 98.17% accuracy for m1A and m6A (Mus musculus) respectively.