Federated Search | Zehai Wang

Contents

1. Expectation from the community

It is exciting to participate in the project “Prototyping federated cloud-search for biomedical data” in NIH hackathon. Here, I want to prepare some the conceptual and technical backgound needed for successful hacking.

The project seems to related to the Pilot phase explores using the cloud to access and share FAIR Biomedical Big Data. The objective to to let researchers to find the interact with data directly in the cloud directly.

###Concepts Define

First task is to understand the specific terminology employed in the project title.

“Prototyping of Federated cloud search of biomedical data “

Key parts of the project, hence, will be “Search“ in the federated cloud and apply the function directly on biomedical data. What is “Federated Search”? How to build search engine? What is the feature of biomedical data compare to other data?

“Federated Search“ : Deloying a seach over distributed and possibly heterogeneous data sets, and receiving in return a unified search results list. Federated cloud have alias as cloud federation and cloud clusters.

NIH is using a BD2K KnowEnG system deplyed on a public cloud infrastructure to provide easy acess to state of the and and compytationally intensive genomics analysis in a scalable and decentralized manner

“Search engine” : bring us closer to data and database on the cloud

“Biomedical data”: According to the (https://datascience.nih.gov/bioCADDIE). Since it is far more expensive to collect than to analyze data, it is essbecially valuable to anlysis biomedical data uploaded to the commens from the researches. The challanges to use the emerging biomedical data is that :

Expectation from the community

Heterogeneous nature of biomedical data;
Lack of data discover infrastructure;
Security and authorization;
Sevice that suppot interoperability between exising biomedical data and tool repositories and portability between cloud service providers.
Store the data online will enable user to integrate scalable cloud computation and explore the result with interactive visualization.