WP6+WP3 session on LB-SMA during Lund plenary
Lund, 11th May 2011
Participants
- CEFRIEL: Emanuele, Irene
Saltlux: Tony, ?SeonHo
- Siemens: Volker, Yi
- UTC: Raluca, Silvio (in the morning)
Agenda
- objective setting
- check of current state
- planning towards the review
- Open issues
- language
- instantiation speed
- sentiment analysis speed
- integration
- Interface between BOTTARI and LarKC
- queries
- sample answers
Discussion
Objective setting
- Emanuele explains the proposal for an interface between BOTTARI mobile application and back-end LarKC workflows
- Tony's video of BOTTARI shows how the application works
- purpose is to recreate (at least partially) the functionalities of BOTTARI with WP3 plugins
- NEED: list of all queries BOTTARI needs to ask and the full sample of results it expects
Yi asks about the language barrier --> Tony says they are going to use Google Translate, but not directly in the triple store because of the data size (5M tweets/day)
- Tony raises the issue of integration between Saltlux stuff and the rest of research
Check of current state
- BOTTARI
?SeonHo presents BOTTARI:
- ontology and ontology instantiation:
- twitter data + sentiment/reputation + location
- gathering twitter data by adaptive crawling (influencer have more importance); Tony comments on the crawling techniques (API, streaming API, direct crawling, robot account) and their combination; computing the ranking of users to adapt crawling (based on: how many tweets, how many followers, how many retweet/mentions)
2 months gathered Korean data: 356M tweets (5M per day) and 1.1M users (14K per day) --> each user tweets about 300-400 time (out of a 50M population!)
sentiment analysis: each tweet about a named place is analysed to compute some ratings (reputation along different features); 2 different technologies: rule-based analysis (based on use of NLP/ML and language used in tweets --> they us Weka ML to generate rules to apply to new messages; Raluca suggests to try Mahout to replace Weka) or SVM (language independent model); Emanuele suggests that even if for now they use either one or the other, they could explore a combination of them (like the reasoning by committee by Yi); experiment on performance (test results: 70% generated rules, 90% manual rules, 50-60% SVM), goal to get 80% accuracy
- ontology instantiation: we miss only discuss (retweet/reply) properties; Emanuele notes that we should add lat/long and place category to named place; Emanuele suggests also to add link between user and named place and the shares of positive/neutral/negative reputation on named places; following/followers instantiation is very slow... Emanuele proposes to use D2R or similar; Irene suggests to concentrate on twitter users who effectively talked about some named place
- plan to use larkc
- diverse possible workflow structure; DOR is the text mining platform (document object retriever); SOR is the framework to combine different backend repositories, including OWL-based reasoners and also other sentiment analysis "reasoners"
current and planned LarKC workflows as envisioned by Saltlux --> still
- demo
- arrangement for the review: panels with images of Seoul in different orientations to demonstrate AR
maven identification --> to send questions about the restaurant
recommendations --> to provide further suggestions and to send messages about the place (via twitter again? maybe in the future)
- tweets about nearby places based on location
- visualize the map behind the place
- ontology and ontology instantiation:
- Queries!
- Emanuele shows the queries we put in the MSM paper to understand if they are meaningful for the BOTTARI application
opinion maker query --> it can be done in C-SPARQL (Cefriel/PoliMi work) or in SPARQL without time
"who shall I follow" query --> it can be done with SPARQL with probability (Siemens work)
- let's try to agree on the needed queries
AR query: just a SELECT to find details (lat, long, elevation, category, posi/nega/nutral) with a FILTER on the distance from the current position --> simple query, Saltlux business
- Emanuele shows the queries we put in the MSM paper to understand if they are meaningful for the BOTTARI application
% SEE QUERY NUMBER '''20''' IN THE ECLIPSE PROJECT
% bot: prefix indicates the "Lyon" ontology
SELECT ?poi ?name ?lat ?long ?altitude ?category ?numPos ?numNeu ?numNeg
WHERE {
?poi a bot:NamedPlace ;
bot:name ?name ;
geo:lat ?lat ;
geo:long ?long ;
geo:alt ?altitude ;
skos:subject ?category .
OPTIONAL { ?poi bot:numeberOfPositiveTweets ?numPos . }
OPTIONAL { ?poi bot:numeberOfNeutralTweets ?numNeu . }
OPTIONAL { ?poi bot:numeberOfNegativeTweets ?numNeg . }
FILTER( ?lat>"45.46"^^xsd:float
&& ?lat<"45.47"^^xsd:float
&& ?long>"9.18"^^xsd:float
&& ?long<"9.20"^^xsd:float ) % the coordinates are just examples; datatype TO-FIX
}PROBLEMS, COMMENTS and TASKS
- add bot:numeberOfPositiveTweets, bot:numeberOfNeutralTweets, bot:numeberOfNegativeTweets to the ontology
- shall we use skos:subject instead of bot:category ?
in the final version we should change the filter clause introducing something like distance(POINT(?lat,?log), POINT("45.56","9.45"))<"100"meters
- selected POI query and display of alternative POIs:
SELECT of recommended POIs given the current one (FILTER condition on distance and same category) --> simple query, Saltlux business
% SEE QUERY NUMBER '''21''' IN THE ECLIPSE PROJECT
SELECT ?poi ?lat ?long?category
WHERE {
data:ZuccaInGalleria a bot:NamedPlace ;
geo:lat ?givenLat ;
geo:long ?givenLong ;
skos:subject ?category .
?poi a bot:NamedPlace ;
geo:lat ?lat ;
geo:long ?long ;
skos:subject ?category . % same category as before
FILTER (?poi!=data:ZuccaInGalleria) % different poi
FILTER(
(?lat-?givenLat)<"0.1"^^xsd:float &&
(?lat-?givenLat)>"-0.1"^^xsd:float &&
(?long-?givenLong)<"0.1"^^xsd:float &&
(?long-?givenLong)>"-0.1"^^xsd:float
)
}PROBLEMS, COMMENTS and TASKS
in the final version we should change the filter clause that naively computes distance based on the lat, givenLat, long and givenlong introducing something like distance(POINT(?lat,?long), POINT("?givenLat","?givenLong"))<"100"meters
same query but with probability (e.g. for missing data about POI category) or with probability on the POIs similarity or with probability about the reputation --> are there enough features per each POIs to train a machine learning algorithm? we can try, we need to investigate
SELECT ?poi ?lat ?long?category
WHERE {
data:ZuccaInGalleria a bot:NamedPlace ;
geo:lat ?givenLat ;
geo:long ?givenLong .
?poi a bot:NamedPlace ;
geo:lat ?lat ;
geo:long ?long .
FILTER (?poi!=data:ZuccaInGalleria)
FILTER(
(?lat-?givenLat)<"0.1"^^xsd:float &&
(?lat-?givenLat)>"-0.1"^^xsd:float &&
(?long-?givenLong)<"0.1"^^xsd:float &&
(?long-?givenLong)>"-0.1"^^xsd:float )
data:ZuccaInGalleria bot:similar ?poi . WITH PROB ?p
% for the actual "implementation" of SPARQL with probabilities, see next query and the discussion below
}
ORDER BY DESC(?p) % higher probability first
LIMIT 10% SEE QUERY NUMBER '''23''' IN THE ECPLISE PROJECT
PREFIX f: <java:ext.>
SELECT ?poi ?lat ?long ?category (f:similarWithProbability(data:ZuccaInGalleria,?poi) AS ?p )
WHERE {
data:ZuccaInGalleria a bot:NamedPlace ;
geo:lat ?givenLat ;
geo:long ?givenLong .
?poi a bot:NamedPlace ;
geo:lat ?lat ;
geo:long ?long .
FILTER (?poi!=data:ZuccaInGalleria)
FILTER((?lat-?givenLat)<"0.1"^^xsd:float &&
(?lat-?givenLat)>"-0.1"^^xsd:float &&
(?long-?givenLong)<"0.1"^^xsd:float &&
(?long-?givenLong)>"-0.1"^^xsd:float )
data:ZuccaInGalleria bot:similar ?poi .
}
ORDER BY DESC(?p)
LIMIT 10PROBLEMS, COMMENTS and TASKS
- those inherited by previous queries
- substitute the current dummy implementation of the custom function similarWithProbability(IRIa,IRIb) with one that invoke SUNS
- check if LarKC endpoint accepts this queries that include custom functions
- SELECT POIs of the same category of the current POI that the user would probably like (that SUNS predicts s/he have the highest probability to rated positively now)
SELECT ?poi ?name ?category ?p
WHERE {
<http://bottari.kr/GiulioPaneOlio> geo:alt ?lat ; % sample current POI
geo:long ?long ;
bot:category ?category .
?poi geo:alt ?lat1 ; % target POI
geo:long ?long1 ;
bot:name ?name ;
bot:category ?category . % same category of the current POI
FILTER (distance(POINT(?lat,?log), POINT(?lat1,?long1))<"100"^^meters) % filter on the distance; datatype TO-FIX
<http://bottari.kr/Emanuele> bot:post [
bot:talksPositivelyAbout ?poi ]
. WITH PROB ?p % probability that the current user will twitter positively about this POI
% for the actual "implementation" of SPARQL with probabilities, see below
}
ORDER BY ?p
LIMIT 10% SEE QUERY NUMBER '''24''' IN THE ECPLISE PROJECT
PREFIX f: <java:ext.>
SELECT ?poi ?name ?category (f:talksAboutPositivelyWithProbability(data:Alice,?poi) AS ?p )
WHERE {
data:ZuccaInGalleria a bot:NamedPlace ;
geo:lat ?givenLat ;
geo:long ?givenLong ;
skos:subject ?category .
?poi a bot:NamedPlace ;
geo:lat ?lat ;
geo:long ?long ;
skos:subject ?category .
FILTER (?poi!=data:ZuccaInGalleria)
FILTER((?lat-?givenLat)<"0.1"^^xsd:float &&
(?lat-?givenLat)>"-0.1"^^xsd:float &&
(?long-?givenLong)<"0.1"^^xsd:float &&
(?long-?givenLong)>"-0.1"^^xsd:float )
data:Alice bot:posts ?tweet .
?tweet bot:talksAboutPositively ?poi .
FILTER(f:talksAboutPositivelyWithProbability(data:Alice,?poi)>0.5)
}
ORDER BY DESC(?p)
LIMIT 10PROBLEMS, COMMENTS and TASKS
- those inherited by previous queries
- substitute the current dummy implementation of the custom function talksAboutPositivelyWithProbability(IRIa,IRIb) with one that invoke SUNS
counting the posi/nega/neutral tweets about a given POI using C-SPARQL (stream with windows) --> not sure if small windows are meaningful in this case...
SELECT ?poi (COUNT(?positiveTweet) AS ?numPos) (COUNT(?negativeTweet) AS ?numNeg) (COUNT(?neutralTweet) AS ?numNeu)
FROM STREAM <http://bottari.kr/streamOftweets> [10d STEP 1d] % register the stream of twitter to consider and the time window
WHERE {
{
?positiveTweet a bot:Tweet ;
bot:talksPositivelyAbout ?poi .
} UNION
{
?negativeTweet a bot:Tweet ;
bot:talksNegativelyAbout ?poi .
} UNION
{
?neutralTweet a bot:Tweet ;
bot:talksNeutrallyAbout ?poi .
}
}- more meaningful queries:
- tweets about a given kind of POI of people similar to me that tweeted nearby in the last x minutes;
SELECT ?poi ?user (similarity(<http://bottari.kr/Emanuele>, ?user) AS ?p)
FROM STREAM <http://bottari.kr/streamOftweets> [1h STEP 10m]
WHERE {
?user bot:post ?t . % target user tweeted
?t bot:talksPositivelyAbout ?poi . % his tweet was about a poi
?poi geo:lat ?lat; % target poi has a position and category
geo:long ?long ;
bot:category ?category .
<http://bottari.kr/Emanuele> geo:lat ?e-lat; % position of current user
geo:long ?e-long ;
bot:post ?e-t . % current user tweeted
?e-t bot:talksPositivelyAbout ?e-poi; % his tweet was about another poi
?e-poi category ?category . % the other poi had the same category of the target one
FILTER (distance(POINT(?lat,?log), POINT(?e-lat,?e-long))<"1000"^^meters) % the target poi is close to the current user
}
ORDER BY ?p
LIMIT 10% SEE QUERY NUMBER '''25''' IN THE ECPLISE PROJECT
PREFIX f: <java:ext.>
SELECT ?poi1 ?poi2 ?user (f:similarWithProbability(data:Alice, ?user) AS ?p)
WHERE {
?user bot:posts ?t1 .
?t1 bot:talksAboutPositively ?poi1 .
?poi1 a bot:NamedPlace ;
geo:lat ?lat1 ;
geo:long ?long1 ;
skos:subject ?category .
data:Alice geo:lat ?givenLat ;
geo:long ?givenLong ;
bot:posts ?t2 .
?t2 bot:talksAboutPositively ?poi2 .
?poi2 a bot:NamedPlace ;
geo:lat ?lat2 ;
geo:long ?long2 ;
skos:subject ?category .
FILTER(?t1!=?t2)
FILTER(f:similarWithProbability(data:Alice, ?user)>0.5)
FILTER((?lat1-?givenLat)<"0.1"^^xsd:float &&
(?lat1-?givenLat)>"-0.1"^^xsd:float &&
(?long1-?givenLong)<"0.1"^^xsd:float &&
(?long1-?givenLong)>"-0.1"^^xsd:float )
}
ORDER BY DESC(?p)
LIMIT 10PROBLEMS, COMMENTS and TASKS
- the text of the query does not match the query itself. I did my best to implement it, but I'm not sure if I got it right
- people around me that may like a POI around me next;
SELECT ?user ?nextPoi (probability(?user, ?nextPoi) AS ?p)
FROM STREAM <http://bottari.kr/streamOftweets> [1h STEP 10m]
WHERE {
?user bot:post ?t . % target user tweeted
?t bot:talksAbout ?poi . % his tweet was about a poi
?poi geo:lat ?lat; % that poi was in a position
geo:long ?long .
<http://bottari.kr/Emanuele> geo:lat ?e-lat; % the current user is in a position
geo:long ?e-long .
FILTER (distance(POINT(?lat,?log), POINT(?e-lat,?e-long))<"1000"^^meters) % the location of the poi is close to the current user
?user bot:post [ bot:talksPositivelyAbout ?nextPoi ] . % the target user could tweet about another target poi
FILTER ( probability(?user, ?nextPoi) <1 ) % but he hasn't done it yet (otherwise the probability would have been 1)
?nextPoi geo:lat ?nextLat; % the target poi is in a position
geo:long ?nextLong .
FILTER (distance(POINT(? nextLat,?nextLong), POINT(?e-lat,?e-long))<"1000"^^meters) % the target poi is close to the current user
}
ORDER BY ?p
LIMIT 10% SEE QUERY NUMBER '''26''' IN THE ECPLISE PROJECT
PREFIX f: <java:ext.>
SELECT ?user ?nextPoi (f:talksAboutPositivelyWithProbability(?user,?nextPoi) AS ?p ) AS ?p )
WHERE {
?user bot:posts ?t1 .
?t1 bot:talksAboutPositively ?poi1 .
?poi1 a bot:NamedPlace ;
geo:lat ?lat ;
geo:long ?long .
data:Alice geo:lat ?givenLat ;
geo:long ?givenLong .
FILTER((?lat-?givenLat)<"0.1"^^xsd:float &&
(?lat-?givenLat)>"-0.1"^^xsd:float &&
(?long-?givenLong)<"0.1"^^xsd:float &&
(?long-?givenLong)>"-0.1"^^xsd:float )
?user bot:posts ?t2 .
FILTER(?t1!=?t2)
?t2 bot:talksAboutPositively ?nextPoi .
FILTER(f:talksAboutPositivelyWithProbability(?user,?nextPoi)<1)
?nextPoi a bot:NamedPlace ;
geo:lat ?nextPoiLat ;
geo:long ?nextPoiLong .
FILTER((?nextPoiLat-?givenLat)<"0.1"^^xsd:float &&
(?nextPoiLat-?givenLat)>"-0.1"^^xsd:float &&
(?nextPoiLong-?givenLong)<"0.1"^^xsd:float &&
(?nextPoiLong-?givenLong)>"-0.1"^^xsd:float )
}
ORDER BY DESC(?p)
LIMIT 10
PROBLEMS, COMMENTS and TASKS
- those inherited by the previous queries
- POIs around me that I could like
SELECT ?poi (probability(<http://bottari.kr/Emanuele>, ?poi) AS ?p)
FROM STREAM <http://bottari.kr/Emanuele/position> [10m 1m] % do we have this stream? do we need it?
WHERE {
<http://bottari.kr/Emanuele> geo:lat ?e-lat; % the user is in a position
geo:long ?e-long .
?poi geo:lat ?poiLat; % the target poi is in another position
geo:long ?poiLong .
FILTER (distance(POINT(? poiLat,? poiLong), POINT(?e-lat,?e-long))<"100"^^meters) % the target poi is close to the user
<http://bottari.kr/Emanuele> bot:post [ bot:talksPositivelyAbout ?poi ]. % the current user could tweet about that poi positively
FILTER(probability(<http://bottari.kr/Emanuele>, ?poi)<1) % but he hasn't done it yet (otherwise the probability would have been 1)
}
ORDER BY ?p
LIMIT 10% SEE QUERY NUMBER '''27''' IN THE ECPLISE PROJECT
PREFIX f: <java:ext.>
SELECT ?poi (f:talksAboutPositivelyWithProbability(data:Alice,?poi) AS ?p )
WHERE {
data:Alice geo:lat ?givenLat ;
geo:long ?givenLong .
?poi a bot:NamedPlace ;
geo:lat ?lat ;
geo:long ?long .
FILTER((?lat-?givenLat)<"0.1"^^xsd:float &&
(?lat-?givenLat)>"-0.1"^^xsd:float &&
(?long-?givenLong)<"0.1"^^xsd:float &&
(?long-?givenLong)>"-0.1"^^xsd:float )
data:Alice bot:posts ?t .
?t bot:talksAboutPositively ?poi .
FILTER(f:talksAboutPositivelyWithProbability(data:Alice,?poi)<1)
}
ORDER BY DESC(?p)
LIMIT 10 PROBLEMS, COMMENTS and TASKS
- those inherited by the previous queries
[
Aside discussion: how to model "with probability" in RDF/SPARQL
- in the data model
- using named graphs grouping predicted matrix content by probability values (each graph has a range of probability values, e.g. [0,0.1] [0.1,0.2] etc.)
- creating a SPARQL custom function for probability to be used as projection function, and extended value testing function
DECISION: use custom function
]
Planning towards the review
- Plugins and Workflows
- last query: plugin that takes the query + C-SPARQL that selects the POI around me + SUNS that calculates the probability of me liking the POI + a SPARQL processors that does the FILTER/ORDER BY/LIMIT and gives the results
(Crazy) Plan
- [May 20th] Saltlux provides the POI ontology (together with the sample RDF data on the SVN)
- [date???] then Siemens to train SUNS to make predictions
- [May 22nd-23rd-24th-25th-26th (one query per day)] for each query, Saltlux to provide a sample answer
- [May 23rd-24th-25th-26th-27th (one workflow per day)] for each query and sample answers, Cefriel creates correct workflows with dummy plugins and sample intermediate results between plugins that - given the query - provides the sample answer
- [July 11th] for each workflow, each partner code the correct plugins that given the input provides the expected output
SPARQL --> Saltlux
C-SPARQL --> Cefriel
SUNS --> Siemens
- [July 15th - Emanuele goes to Korea][July between 26th-28th - Siemens and Saltlux go to Milano] for each query and sample answer, have the real workflow up and running
- [by the review?] connect BOTTARI to the workflows
Next Phone Call
May 25th at 10.00 CEST
