Master's thesis · 2021

AdvancedCrawlingSolutionfortheWebDatabases

Google for SQL — sub-second search across an enterprise SQL catalogue.

Focused crawler architecture — sub-second SQL catalogue search built on Lucene.NET

A Vue.js-based ETL tool that crawls an enterprise SQL catalogue and returns sub-second results across tables, stored procedures, views, and keys. Supports partial-match and exact-match modes; results render in tabular and graphical views.

The thesis also proposed a deep-web harvesting design using parsing, page-rank, and binary-vector ranking — drawing on Stanford CoreNLP for topic classification and DBpedia Spotlight for semantic annotation. Highest-graded thesis, BIT Mathematics Department.

Vue.js
ETL
Stanford CoreNLP
DBpedia Spotlight