Loading…
This event has ended. Create your own event → Check it out
This event has ended. Create your own
Register Now or Visit the Website for more Information 
View analytic
Tuesday, May 10 • 3:00pm - 3:50pm
Focused Crawling with Apache Nutch - Sujen Shah, NASA JPL

Sign up or log in to save this to your schedule and see who's attending!

The vast nature of the Web has forced researchers to continually develop advanced data acquisition strategies that overcome a multitude of obstacles in order to acquire relevant topical content and assimilate it with their needs. Many groups have researched focused Web crawling techniques in order to better guide their data acquisition efforts, however few approaches consider the scenario where one wishes to undertake DD on the open Web for which no prior semantic knowledge resources are available. Sujen and his team have investigated and developed a new application of the cosine similarity metric (CSM) which has been implemented as part of a novel strategy for domainspecificDD. 

In this presentation, Sujen would review the recent work in focused crawling and the ability to run similarity scoring within a production ready, scalable Web crawler, Apache Nutch.

Speakers
avatar for Sujen Shah

Sujen Shah

Research Intern, NASA Jet Propulsion Laboratory
Sujen is a Masters student pursuing Computer Science at the University of Southern California, Los Angeles. As a committer and member of the Apache Nutch PMC, his work includes augmenting the focused crawling capabilities of Nutch. These new scoring plugins are supporting the efforts of NASA JPL in the DARPA MEMEX program. | Sujen's interests lie in the field of Data mining and Web Information Retrieval and is passionate about being involved in... Read More →



Tuesday May 10, 2016 3:00pm - 3:50pm
Georgia B

Attendees (3)