Contents
  1. 1. Coding styles are especially important for teamwork
  2. 2. “Tmux” is your friend when during with large dataset.
  3. 3. My intersts in text mining and natural language processing

It is a great honor to be elected to participate into the NIH NCBI hackathon. A lot of talented faculties, investigators from bioinformatics, neurology, immulogy, physics gather at NIH campus at Bethedas, MD. During the three-day event, my team evisioned a data pipeline that bridges clinical and academic word, called CLINT.

CLINT, as we vision, is a data gathering and query pipeline that will parse electronic medical record (EMR) reports, interface with Neurosynth to produce a list of symptoms correlated to structures or structures correlated to symptoms queried and return a report in an EMR ingestible format. More details can be found in our GitHub repo.

coding time

Here is what I get from this awesome events:

Coding styles are especially important for teamwork
  1. Use “four space” instead of
  2. Use
“Tmux” is your friend when during with large dataset.
  1. During the hack, we use datasets as much as 25 GB, which take 20 mins to load to the server memory. It would be a disaster
My intersts in text mining and natural language processing
  1. The major task the parse the SNOMED condition occurance is

  2. Try practical ways of steming and lemmatization, removing n-grams…, More details will discuss in future repo.