Voice based query interface for database: Stanford Parser

A natural language parser is a program that works out the grammatical structure of sentences, for instance, which groups of words go together (as “phrases”) and which words are the subject or object of a verb.

Dan Klein wrote the original version of this parser and Christopher Manning helped him by his support code and linguistic grammar development.

A Part-Of-Speech Tagger (POS Tagger) is a piece of software that reads text in some language and assigns parts of speech to each word (and other token), such as noun, verb, adjective, etc., although generally computational applications use more fine-grained POS tags like ‘noun-plural’.

Example:-

Query

Show me the Symptoms of Cancer

Tagging

Show/VB    me/PRP    the/DT    Symptoms/NNPS    of/IN    Cancer/NNP

Alphabetical list of part-of-speech tags used :

Number
Tag
Description
1. CC Coordinating conjunction
2. CD Cardinal number
3. DT Determiner
4. EX Existential there
5. FW Foreign word
6. IN Preposition or subordinating conjunction
7. JJ Adjective
8. JJR Adjective, comparative
9. JJS Adjective, superlative
10. LS List item marker
11. MD Modal
12. NN Noun, singular or mass
13. NNS Noun, plural
14. NNP Proper noun, singular
15. NNPS Proper noun, plural
16. PDT Predeterminer
17. POS Possessive ending
18. PRP Personal pronoun
19. PRP$ Possessive pronoun
20. RB Adverb
21. RBR Adverb, comparative
22. RBS Adverb, superlative
23. RP Particle
24. SYM Symbol
25. TO to
26. UH Interjection
27. VB Verb, base form
28. VBD Verb, past tense
29. VBG Verb, gerund or present participle
30. VBN Verb, past participle
31. VBP Verb, non-3rd person singular present
32. VBZ Verb, 3rd person singular present
33. WDT Wh-determiner
34. WP Wh-pronoun
35. WP$ Possessive wh-pronoun
36. WRB Wh-adverb

We are categorizing/parsing these queries according to different grammatical phrases which in turn is helping us to easily recognize keywords that we are using to search from database. These keywords are then compared to the lexicons of the database which include the relation names, attributes and values available in the database.  The wh-word (query word) defines an operation in an SQL query such as Select and a noun of any form defines a keyword to be searched in the database. The rest of the tokens are used to describe a relation between them defining what the user is actually looking for.