Data Sets
The current data sets are gathered from Twitter [from timeA to timeB] and contain
- Ratings of POIs which users posted in their tweets
- Info of POIs (such as location, category, ...)
Ratings of POIs
Statistics:
|
No. |
No. of POIs |
Non-zero entries % (sparsity) |
User |
31369 |
|
|
Positive ratings |
19045 |
213 |
0,29% |
Negative ratings |
14404 |
181 |
0,25% |
Neutral ratings |
75941 |
245 |
0,99% |
Total |
109390 |
|
|
Ratings per user |
3,49 |
|
|
Questions/problems:
- Yi: A user can post several tweets about the very same POI.
- E.g.
<twd:TwitterUser rdf:about="http://www.saltlux.com/larkc/lbsma/twitterdata#115564683"> ... <twd:neutral rdf:resource="http://www.saltlux.com/geospatial#place_909"/> (twitterdata#11855941486, twitterdata#11851074583, ...) <twd:dislikes rdf:resource="http://www.saltlux.com/geospatial#place_909"/> (twitterdata#11662304895, twitterdata#12544728640, ...) <twd:likes rdf:resource="http://www.saltlux.com/geospatial#place_909"/> (twitterdata#12571513496, ...) ... </twd:TwitterUser>- How to handle this?
- Option 1: average the values of all ratings of such POI made by this user and then we have only one relationship rates
- Option 2: separately model positive, negative and neutral relationships and then combine them
- - I did preliminary experiments which showed that the combination outperformed the positive only.
- Option 3: similar to option 1, but take only the last rating (according to the timestamp)
- Note that we could set 1, -1 and 0.1 for positive, negative and neutral respectively
- @Seonho: could you please check out whether the multi-rated cases are rational or not? It means whether the related tweets are incorrectly classified or they really have different opinions about the same POIs (e.g. the user changed her mind over time or she rated the POIs from different aspects.)
