NCBI Hackathon "You are awesome"
Updated:
It is a great honor to be elected to participate into the NIH NCBI hackathon. A lot of talented faculties, investigators from bioinformatics, neurology, immulogy, physics gather at NIH campus at Bethedas, MD. During the three-day event, my team evisioned a data pipeline that bridges clinical and academic word, called CLINT.
CLINT, as we vision, is a data gathering and query pipeline that will parse electronic medical record (EMR) reports, interface with Neurosynth to produce a list of symptoms correlated to structures or structures correlated to symptoms queried and return a report in an EMR ingestible format. More details can be found in our GitHub repo.
Here is what I get from this awesome events:
Coding styles are especially important for teamwork
- Use “four space” instead of
- Use
“Tmux” is your friend when during with large dataset.
- During the hack, we use datasets as much as 25 GB, which take 20 mins to load to the server memory. It would be a disaster
My intersts in text mining and natural language processing
The major task the parse the SNOMED condition occurance is
Try practical ways of steming and lemmatization, removing n-grams…, More details will discuss in future repo.