New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for full_text in tweets; resolve #192. #252

Merged
merged 2 commits into from Aug 10, 2018

Conversation

Projects
None yet
2 participants
@ruebot
Member

ruebot commented Aug 10, 2018

GitHub issue(s):
#192

What does this Pull Request do?

Add support to the Tweet utility to use full_text from tweets.

How should this be tested?

I did this with Apache Spark 2.3.1:

scala> :paste
// Entering paste mode (ctrl-D to finish)

import io.archivesunleashed._
import io.archivesunleashed.matchbox._
import io.archivesunleashed.util.TweetUtils._

// Load tweets from HDFS
val tweets = RecordLoader.loadTweets("/path/to/replies.jsonl", sc)

// Count them
tweets.count()

// Extract some fields
val r = tweets.map(tweet => (tweet.id, tweet.createdAt, tweet.username, tweet.text, tweet.fullText, tweet.lang,
                             tweet.isVerifiedUser, tweet.followerCount, tweet.friendCount))

// Take a sample of 10 on console
r.take(10)

// Exiting paste mode, now interpreting.

import io.archivesunleashed._
import io.archivesunleashed.matchbox._
import io.archivesunleashed.util.TweetUtils._
tweets: org.apache.spark.rdd.RDD[org.json4s.JValue] = MapPartitionsRDD[28] at filter at package.scala:65
r: org.apache.spark.rdd.RDD[(String, String, String, String, String, String, Boolean, Int, Int)] = MapPartitionsRDD[29] at map at <console>:50
res1: Array[(String, String, String, String, String, String, Boolean, Int, Int)] = Array((1004742470700322816,Thu Jun 07 15:10:46 +0000 2018,realDonaldTrump,"",When will people start saying, “thank you, Mr. President, for firing James Comey?”,en,true,52543149,46), (1005275688990052352,Sat Jun 09 02:29:36 +0000 2018,love4All_7,"",@realDonaldTrump Pleases look into my brothers case. Miskin Kamara, he's been locked up for 13+ years w...

If you want to use the same tweet set, it's here.

Additional Notes:

Once this is good to go, I'll take care of #194.

Interested parties

@ianmilligan1 @lintool

@ianmilligan1

Tested and works very well. Can merge once the lights turn green.

@codecov

This comment has been minimized.

Show comment
Hide comment
@codecov

codecov bot Aug 10, 2018

Codecov Report

Merging #252 into master will increase coverage by 0.02%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #252      +/-   ##
==========================================
+ Coverage   70.11%   70.14%   +0.02%     
==========================================
  Files          41       41              
  Lines        1024     1025       +1     
  Branches      191      191              
==========================================
+ Hits          718      719       +1     
  Misses        240      240              
  Partials       66       66
Impacted Files Coverage Δ
...n/scala/io/archivesunleashed/util/TweetUtils.scala 92.3% <100%> (+0.64%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a5fa151...a4aba17. Read the comment docs.

codecov bot commented Aug 10, 2018

Codecov Report

Merging #252 into master will increase coverage by 0.02%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #252      +/-   ##
==========================================
+ Coverage   70.11%   70.14%   +0.02%     
==========================================
  Files          41       41              
  Lines        1024     1025       +1     
  Branches      191      191              
==========================================
+ Hits          718      719       +1     
  Misses        240      240              
  Partials       66       66
Impacted Files Coverage Δ
...n/scala/io/archivesunleashed/util/TweetUtils.scala 92.3% <100%> (+0.64%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a5fa151...a4aba17. Read the comment docs.

@ianmilligan1 ianmilligan1 merged commit 62628b4 into archivesunleashed:master Aug 10, 2018

3 checks passed

codecov/patch 100% of diff hit (target 70.11%)
Details
codecov/project 70.14% (+0.02%) compared to a5fa151
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

@ruebot ruebot deleted the ruebot:issue-192 branch Aug 10, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment