Ishani-Mondal.github.io

GSOC 2020 Website

Improving the RDF Generated and validating its correctness

At the end of week 2, I have generated the RDF of the Obama Abstract. But I was using loosing out some of the most important triples which are possible to generate. This necessitated to include more information in the white list I have prepared. In addition to that, I have also made use of an external ontology consisting of the synonyms and hypernyms/hyponyms of words, which is termed as the Wordnet.

Steps to improve the RDF Triple generation:

Step1 : Perform a dependency parse of the sentence

Step2 : Check the nounchunks of the sentence and take into account the information of the relations “attr” and “pobj”.

    Example: Obama worked as civil rights attorney.

In this example, attorney is the ‘attr’ of ‘work’, so we include this information for our future processing. Moreover, we also keep information of ‘pobj’ such as:

    Example: Obama is born in Honolulu, Hawaai.

Here ‘Honolulu, Hawaai’ is the pobj of the verb “born”.

Step3 : Determine type of the object

Use SPARQL query to determine the type of the object. For example: attorney is a Person, Honolulu is a Place.

Step4 : Include more in whiteList:

We make use of five white lists including white_dict.json, type_dict.json, synset_dict.json, person_list and place_list.

We make use of the different lists in the following ways:

  1. The modified and lemmatized verbs are initially replaced by information in the type_dict.

  2. We iterate over the white_dict list and check if any subpart is contained in the object part generated by PyClausIE, then replace the object with the value, suppose if “graduate of” is present, it will be replaced by http://dbpedia.org/ontology/college and it will be shifted to the predicate part if the predicate earlier was of RDF.type. Similar replacement will be done in case of ‘verb’ if white_list key is present in the predicate, for example: “be born in” will be replaced by “http://dbpedia.org/ontology/birthPlace.

  3. Finally, we determine the type of subject, if the predicate is of RDF.type, then object should be of same type in the subject, For example:

    Example: Obama is the President of the United States.

Both are person types, so we keep the Spotlight resource of the “President” and also similarly, if information in the person_list is present as predicate, we include the information of type PERSON.

For example:

    Example: Obama works as civil rights attorney.

We keep the attorney information, we use the synset of wordnet of attorney from synset_dict (lawyer).

  1. If the modified predicate indicates “birthplace” or “college”, then the object must be of place type, so we must include the resource of Place type. So, we include that information in the place_List information.

Step 5: Final RDF generated

Result can be found here. Validation has been done using RDF Validator.

Thanks for going through !!!