The general focus of Dr. Valafar’s current research is the application and transfer of engineering techniques to biological systems. In addition, this transfer of information can be reversed to implement efficient optimization and security (immunity) techniques inspired by biological models. His specific research is divided into two main categories: Computational Biology and Computational Medicine.
Our specific interest in the area of Computational Biology is the problem of protein folding. Current approaches to protein folding are segregated into either experimental or computational means of structure determination. Our approach to protein folding can be viewed as a hybrid of the above two main approaches. While traditional methods of structure determination remain to be the most reliable approaches to characterization of structure and dynamics of biomolecules, they have imposed certain tangible limitations on the whole process of rapid and accurate structure determination. On the other hand, the recently emerging approaches to computational structure determination offer a very cost effective means of structure determination while suffering from unconfirmed results.
Our hybrid approach utilizes only 2% of the experimental data required for traditional methods of structure determination while reducing the computational requirement of protein folding problem to a polynomial time complexity. This is possible through a careful consideration of the source of experimental data. In our studies, we choose Residual Dipolar Couplings (RDC) as the main source of experimental data.
Current investigation of the molecular basis for a number of diseases have been unsuccessful. This lack of success has steered the future investigation in the study of disparate data in order to understand the relationship between the environment and the genome. In our view a very important aspect of this interaction is neglected. That is the temporal component of the interaction. Examples borrowed from controls in the design of stable systems stress the importance of temporal component in the modeling of any electrical circuit. The same argument and therefore engineering techniques should be transferable to the study of stable biological systems. Our long term goals of study is therefore a temporal effects of external stimulants on the behavior of cells with given genome. Our short term goal is to develop a physiological state response of individual patients. Our studies include response of Sickle Cell Anemia patients to Hydroxyurea and study of differential expression profiling in patients suffering from cervical cancer.
Our lab has also delved into other projects that aim to incorporate technology and computer science concepts into the medical field. In general, researchers who wish to evaluate medical systems and intervention techniques are forced to resort to more primitive ways of evaluation (hand written surveys, interviews, etc.) It is our goal to create specialized systems that will help researchers tackle huge medical questions with ease.
Proteins are a type of macromolecules that perform important functions within the body. Characterizing the structure of a given protein is often a requisite step in discovery of its particular function. Some diseases cause mutations in certain proteins which can cause "misfolding" and lead to disfunction. Therefore, knowing the structure of a protein can lead to a better understanding of the causes of these diseases and inform intelligent drug design to combat them. In our lab we use the software REDCRAFT to fold proteins from Residual Dipolar Couplings (RDCs) to study their structures. Our approach utilizes only 2% of the experimental data required for traditional methods of structure determination while reducing the computational requirement of protein folding problem to a polynomial time complexity.
Casey A. Cole, D. Ishimaru, M. Hennig, H. Valafar. (2016). Structure Calculation of α,Α/β, β Proteins From Residual Dipolar Coupling Data Using Redcraft. In: H. R. Arabnia, Q. N. Tran (Eds.), Emerging Trends in Applications and Infrastructures for Computational Biology, Bioinformatics, and Systems Biology. Morgan Kaufmann, Imprint of Elsevier, Cambridge, MA, pp. 73–88
Simin, M., Irausquin, S., Cole, C. A., Valafar, H. (2014). Improvements to REDCRAFT: a software tool for simultaneous characterization of protein backbone structure and dynamics from residual dipolar couplings. Journal of Biomolecular NMR. http://doi.org/10.1007/s10858-014-9871-x, PMID: 25403759
Some proteins within the body undergo some form of dynamics to perform their proper function. For example, proteins that metabolize certain ligands (small molecules such as metals or compounds) must "open" and "close" to bind and release the molecules during processing. These motions are of great interest to the scientific community, especially for drug design. These dynamical areas can be used as targets for personalized medicine in which drugs can be designed in such a way to utilize proteins with certain dynamical areas to metabolize the drug faster. In our lab we are developing an approach to identify and characterize protein dynamics using RDC data and the REDCRAFT software package. So far we have shown that our method is capable of identifying discrete motions of 30 degrees or greater in magnitude with high accuracy. As well as reconstructing the various states of dynamics, our method is also capable of estimating the relative occupancy of each state.
Casey A. Cole, Rishi Mukhopadhyay, Hanin Omar, Mirko Hennig, and Homayoun Valafar. (2016). Structure Calculation and Reconstruction of Discrete-State Dynamics from Residual Dipolar Couplings. J. Chem. Theory Comput., 12 (4), pp 1408–1422. DOI: 10.1021/acs.jctc.5b01091
Residual Dipolar Couplings (RDCs) have been shown to provide an effective means for structure calculation of proteins even in the more challenging conditions of internal dynamics or proteins that form complexes. Traditional approaches that utilize RDCs require the data to be "assigned" meaning that it needs to be known which particular residue in the protein produces each RDC value. This step is usually extremely time consuming for experimentalists. Probability Density Profile Analysis (PDPA) is a tool that bypasses assignment by using multiple correlated unassigned RDC data sets and an estimation of experimental error to produce a probability density function (PDF) for the true distribution of RDC's. This is used as a structural fingerprint which is then compared to simulated RDC data from a library of structures to determine the best match. Applications include novel protein fold target selection for structural genomics initiatives, structural homologue detection for structure determination by threading, and confirmation of computationally modeled protein structures from a small amount of experimental data. In recent and ongoing work a derivative of PDPA (nD-PDPA) has shown great promise in refinement of protein structures.
A. Fahim, S. Irausquin, H. Valafar. (2016). nD-PDPA: n-Dimensional Probability Density Profile Analysis. In: H. R. Arabnia, Q. N. Tran (Eds.), Emerging Trends in Applications and Infrastructures for Computational Biology, Bioinformatics, and Systems Biology. Morgan Kaufmann, Imprint of Elsevier, Cambridge, MA, pp. 179–194.
Fahim, A., Mukhopadhyay, R., Yandle, R., Prestegard, J. H., Valafar, H. (2013). Protein Structure Validation and Identification from Unassigned Residual Dipolar Coupling Data Using 2D-PDPA. Molecules (Basel, Switzerland), 18(9), 10162–88. doi:10.3390/molecules180910162, PMID: 23973992. (MCB-0644195, P20 RR-016461)
Traditionally, researchers have relied on sequence alignments to identify similar regions of proteins in order to classify protein function. One major downfall to this technique is that sequence similarity does not necessarily guarantee structural similarity. Therefore structures with similar sequences may not have the same active sites as once thought in the community. Multiple structure alignments have received considerable attention as an alternative to multiple sequence alignments. msTALI is a multiple structure alignment algorithm that utilizes several types of information, including torsion angles, backbone atom positions, surface accessibility, residue type, and others. It combines this information into an efficient progressive alignment algorithm. Applications include protein core extraction, active site identification, and many others. msTALI allows the user to specify the extent to which each type of information is used, and this allows the algorithm to be applicable to a wide variety of problems. Currently our lab is heavily investigating msTALI's utility for active site identification. Current target proteins include ATPases and Kinases.
Devaun McFarland (email@example.com)
Devaun McFarland, Caroline Bullock, Benjamin Mueller, Homayoun Valafar, Application of msTALI in ATPase Active Site Identification, Proceedings of the International Conference on Bioinformatics & Computational Biology (BIOCOMP), July 2016, Las Vegas, NV
Devaun McFarland, Homayoun Valafar, Utility of msTALI in Protein Active Site Identification, SE Regional IDeA Meeting, Nov 15-17 2013, Little Rock, AK
Shealy, P., Valafar, H. (2012). Multiple structure alignment with msTALI. BMC bioinformatics, 13(1), 105. doi:10.1186/1471-2105-13-105, PMID:22607234. (NIH-1R01GM081793, MCB-0644195)
Following a mass casualty incident (MCI), a healthcare facility will experience a surge of patients--many of them quickly arriving to the Emergency Department (ED) via their personal vehicles. Emergency responders and hospital personnel will use triage to rapidly assess patients and prioritize their care with the goal of saving as many lives as possible. Following a chemical exposure, local EDs may receive a surge of victims before any chemical identification information is available, thus complicating treatment decisions. Small communities are additionally challenged because they are ill prepared to manage any surge of patients, regardless of the cause. Alongside our partners in the College of Nursing, our lab is currently developing a robust computer-based informatics tool to improve early chemical identification and to enhance patient processing and triage in the ED following an MCI. Our continuous triage process will monitor and aggregate data across all patients to provide ongoing situational awareness. Evolving wireless physiologic and mobile sensing technology and associated signal analysis prototypes will be explored and incorporated as appropriate for the ED MCI application. Our current development is on Android based systems.
Nicholas Boltin, Daniel Vu, Bethany Janos, Alyssa Shofner, Joan Culley, Homayoun Valafar, An AI Model for Rapid and Accurate Identification of Chemical Agents in Mass Casualty Incidents, Proceedings of the International Conference on Health Informatics and Medical Systems (HIMS), July 2016, Las Vegas, NV
Diseases resulting from prolonged smoking are the most common preventable causes of death in the world today. A great deal of research is being conducted to find comprehensive, effective ways to aid in cessation of smoking. There are a myriad of websites that provide services to smokers such as access to resources, online support and help lines. However, most of these sites do not provide a mobile app for an increasingly mobile population. In recent years smoking cessation mobile applications have also been developed. Most of these apps simply allow the user to enter in the amount of cigarettes that they usually would smoke in a day and then "self-report" any cigarettes that they did smoke and then calculate the money that they would have saved if they hadn't smoked. This self-reporting process is usually unreliable because users cannot always be trusted to keep up with inputing data. Currently our lab is developing an android based app for smartwatches that will automatically detect smoking gestures thus bypassing the issue of self-reporting. Furthermore, our partners in Public Health are looking at ways to utilize our app to provide more meaningful intervention for the user. Our detection mechanism utilizes custom built pattern recognition models in conjunction with accelerometer data directly collected from the smartwatch to detect smoking gestures.
Casey A. Cole, Bethany Janos, Dien Anshari, James F. Thrasher, Scott Strayer, Homayoun Valafar, Recognition of Smoking Gesture Using Smart Watch Technology, Proceedings of the International Conference on Health Informatics and Medical Systems (HIMS), July 2016, Las Vegas, NV
Mutations within an organism's DNA can cause differential expression of genes throughout the body. Discovering the differences in, for example, a healthy cell and a tumorous cell could hold great insight to the mechanism of certain diseases. One way of studying these differences is to compare the transcriptomes of these two cells. A transcriptome is a collection of all the messenger RNA molecules expressed from the genes of an organism. In our lab we utilize our expertise in the software package Trinity to reassemble and analyze transcriptomic data. Our current work has been focused around the novel Uca minax sequence but future projects are in the works including a mouse that is highly resistant to cancer and a type of bacteria that regenerates itself.
Hanin Omar (firstname.lastname@example.org)
Hanin Omar, Casey A. Cole, Arjang Fahim, Giuliana Gusmaroli, Stephen Borgianini, Homayoun Valafar, De Novo Assembly of Uca minax Transcriptome from Next Generation Sequencing, Proceedings of the International Conference on Bioinformatics & Computational Biology (BIOCOMP), July 2015, Las Vegas, NV
Casey A. Cole, Hanin Omar, Arjang Fahim, Guiliana Gusmaroli, Homayoun Valafar, Transcriptome Assembly of the Uca minax, Poster presented at National IDeA Symposium of Biomedical Research Excellence (NISBRE), June 2014, Washington D.C.
1200 Catawba Street, Room 301
Department of Computer Science and Engineering
University of South Carolina
Columbia, SC 29208