ChemxSeer the first publicly available search engine designed specifically for chemical formulae, can sort out when “He” refers to helium and not a person more than nine times out of 10, according to the Penn State College of Information Sciences and Technology (IST) researchers who created the tool. With the new engine, scientists searching for research on CH4 or methane no longer have to wade through search results about Channel 4 or Chapter 4 as ChemxSeer will only return documents with references to the chemical formula. The new algorithm also can identify related chemicals with different formula representations and chemicals with related substructures or similarities.
ChemxSeer search engine is part of an open-source cyber infrastructure project focusing on chemical document search for environmental chemistry and funded by the National Science Foundation. The grant awarded to the Penn State Department of Chemistry aims to enable automatic data analysis. In designing the engine, the researchers built on their expertise in information-extraction algorithms created for CiteSeer, a search engine for academic and science documents.