Early Adopters Workshop, Hands-on session - GATE Transformer with Cyc reasoner and KB
Story
The story in this scenario is based on information extraction from news articles, linking extracted facts with Cyc KB and reasoning over them. There are two core stores:
- Identify news articles about a specified area (e.g. software industry)
SELECT ?company WHERE |
{ ?company appearsIn http://uk.reuters.com/article/ousiv/idUKTRE50E23H20090115 ; |
?company rfd:type http://www.cycfoundation.org/concepts/SoftwareVendor } |
- Identify news articles about competitors (e.g. news about Apple as a competitor to Microsoft)
SELECT ?company WHERE |
{ ?company appearsIn http://uk.reuters.com/article/ousiv/idUKTRE50E23H20090115 ; |
?company rfd:type ?area ; |
http://www.cycfoundation.org/concepts/MicrosoftInc rdf:type ?area } |
The scenario will use two additional tools on top of LarKC platform
- GATE information extraction toolkit for extracting names of companies from the articles
?ResearchCyc for loading extracted facts into its knowledge base and reasoning about the SPARQL queries
After completing this scenarios the users will know:
- How to construct pipeline which can handle non-RDF data on the input and reason over it
- How transformers work with two concrete examples
- How to include resources located on servers as plug-ins
Plug-ins
The pipeline required for such scenario consists of the following plug-ins:
?ArticleIdentifier -- identify relevant web resources from the SPARQL query, in upper examples that would be Reuters article
?GateTransformer -- download articles and extract all mentions of organizations
?CycSelecter -- iterates over the organizations extracted by GATE and passes on only the ones which have a corresponding instance in ?ResearchCyc KB. Adds links to the Cyc instances in the ?SetOfStatements that is passed forward
?CycReasoner -- loads new facts in the KB, reasons over them and returns ?VariableBinding with relevant organizations
?ScriptedDecider -- scripted pipeline
Code for the plug-ins which can perform the upper scenarios will be provided to the participants. ?ResearchCyc will be available on the local network as a server and the ?CycReasoner plug-in will connect to the server. GATE will be either used locally by each user as a .jar file or will be set up as a web service (depends on the potential conflicts between LarKC and GATE libraries).
Step-by-step
Setting up LarKC platform with the described pipeline
Participants establish connection to ?ResearchCyc server, assemble the required plug-ins in a pipeline.
Execution of the pipeline
Running pipeline over few examples and checking the output.
Understanding what is happening under the hood
Going step by step through the pipeline and watching what are the inputs and outputs of each of the plug-ins.
Creating CycTransformer
Moving code from ?CycSelecter which links organizations extracted by GATE with Cyc instances into a separate transformer plug-in. Updating Scripted decider as to include the new ?CycTransformer in the pipeline.
Adding countries besides organizations
Extract countries from articles as well using GATE.
