GSOC 2020 Website
In the first week of the coding period, I implemented the method to extract out the triplets using dependency parse tree and also integrate the results into DBPedia Spotlight The dependency parse tree method is ClauseIE. ClausIE exploits linguistic knowledge about the grammar of the English language to first detect clauses in an input sentence and to subsequently identify the type of each clause according to the grammatical function of its constituents.
For the purpose of extracting out the triplets I have used the input data from here. The details of setting up the environment for extracting out the triplets can be found here.
A clause is a part of a sentence that expresses some coherent piece of information; it consists of one subject (S), one verb (V), and optionally of an indirect object (O), a direct object (O), a complement (C), and one or more adverbials (A).
The algorithm used consists of using different patterns:
Details of the algorithm described in the paper has been summarized below:
STEP 1: Compute the dependency Parse (DP) Tree of the Sentence.
STEP 2: Determine the set of clauses using the DP.
We first identify the clauses in the input sentence, obtain the head word of all the constituents of each clause. First, we construct a clause for every subject dependency in the DP (e.g., nsubj); the dependant constitutes the subject (S) and the governor the verb (V). 4 All other constituents of the clause are dependants of the verb: objects (O) and complements (C) via relations such as: dobj (direct object), iobj (indirect object), xcomp, or ccomp; and adverbials (A) via dependency relations such as advmod, advcl, or prep_in.
STEP 3: Determine the type of clauses using Pattern of the dependency relations. There are seven basic patterns which are as follows:
Clause Type Examples Derived Clauses
============================================================================================================================
Subject-Verb He ran (He,ran)
Subject-Verb-Adverbial He taught at Harvard (He, taught, at Harvard)
Subject-Verb-Complement He is a politician (He, is, a politician)
Subject-Verb-Object He taught constitutional law (He, taught, constitutional law)
Subject-Verb-Object-Adverb He taught constitutional law at Harvard (He, taught, constitutional law, at Harvard)
Subject-Verb-iObject-dobject He gave her the Nobel Prize. (He, gave, her, the Nobel Prize)
Subject-Verb-Object-Complement AE declared the meeting open. (AE, declared, the meeting, open)
Once clauses have been detected, we generate one or more propositions for each clause based on the type of the clause. For example : when we have more than one object phrases, in case of SVOA(Subject-Verb-Object-Adverb), we generate two triples such as : SVO and SVOA therefore, in example of (He, taught, constitutional law, at Harvard), we will have two Triples <He, taught, constitutional law at Harvard> and another one as <He, taught, constitutional law>
Moreover, in case of Subject-Verb-iObject-dobject, we generate the triples in similar way with more focus on the direct object, therefore, in case of (He, gave, her, the Nobel Prize), we get two triples such as: <He, gave, her the Nobel Prize> and another one as <He, gave, the Nobel Prize>
In case of (AE, declared, the meeting, open), we create one triple as : <AE, declared, the meeting open>.
At the end of the first week, I have generated the triples here.