Permalink
Please
sign in to comment.
Showing
with
1,795 additions
and 2 deletions.
- +26 −0 .codecov.yml
- +20 −0 .gitignore
- +27 −0 .travis.yml
- +81 −0 CODE_OF_CONDUCT.md
- +53 −0 CONTRIBUTING.md
- +1 −1 LICENSE
- +88 −1 README.md
- +13 −0 config/LICENSE_HEADER.txt
- +189 −0 config/checkstyle/scalastyle_config.xml
- +14 −0 config/checkstyle/suppressions.xml
- +615 −0 pom.xml
- +13 −0 src/main/assembly/python.xml
- +45 −0 src/main/scala/io/archivesunleashed/twut/package.scala
- +10 −0 src/test/resources/10-sample.jsonl
- +100 −0 src/test/resources/100-sample.jsonl
- +500 −0 src/test/resources/500-sample.jsonl
@@ -0,0 +1,26 @@ | ||
codecov: | ||
notify: | ||
require_ci_to_pass: yes | ||
|
||
coverage: | ||
precision: 2 | ||
round: down | ||
range: "50...80" | ||
|
||
status: | ||
project: yes | ||
patch: yes | ||
changes: no | ||
|
||
parsers: | ||
gcov: | ||
branch_detection: | ||
conditional: yes | ||
loop: yes | ||
method: no | ||
macro: no | ||
|
||
comment: | ||
layout: "header, diff" | ||
behavior: default | ||
require_changes: no |
@@ -0,0 +1,20 @@ | ||
.DS_Store | ||
.classpath | ||
.project | ||
target/ | ||
.idea/ | ||
*.iml | ||
*~ | ||
src/main/solr/lib/ | ||
.gradle | ||
.settings | ||
.*.swp | ||
workbench.xmi | ||
build | ||
derby.log | ||
metastore_db | ||
__pycache__/ | ||
src/main/python/tf/model.zip | ||
src/main/python/tf/util/spark.conf | ||
src/main/python/tf/model/graph/ | ||
src/main/python/tf/model/category/ |
@@ -0,0 +1,27 @@ | ||
dist: xenial | ||
language: java | ||
|
||
branches: | ||
only: | ||
- master | ||
|
||
jdk: | ||
- openjdk11 | ||
|
||
before_install: | ||
- "echo $JAVA_OPTS" | ||
- "export JAVA_OPTS=-Xmx512m" | ||
- "export MAVEN_OPTS=-Dorg.slf4j.simpleLogger.defaultLogLevel=warn" | ||
|
||
script: | ||
- mvn install -B -V | ||
- mvn javadoc:jar | ||
- mvn javadoc:test-aggregate | ||
- mvn site | ||
|
||
after_success: | ||
- bash <(curl -s https://codecov.io/bash) | ||
|
||
notifications: | ||
slack: | ||
secure: CCjvzkv9khqeAIgbMjXnIoQi0qZ55K6RtxGk9bqzY+r/xiUTmgat9N9+Alyuq3kK9rNNoQZQwR9rOyvPf9ymkifFnGxSglBSLHXzpxnftwOCasB0wf0OkENYfa8BrDhSk9EZPHsfGNqtcb5tm6/hLK2Kd49qGkYkT1ct3O0jWwWsn0SOmyNh2znIxMwCGKUMmrk/opVEKLvXmZRM7jCStCzFRrfR/d0QrPa9MYOLaFy75bVK8NcIJd4s6seOMf9OifBnfE34FY9DOL8fWnZEIx9eG6ajMYDP+6gn/v9JOZoNybfTojrpsWqCK1ytItzeToMAz9n8ULB0sUXAY0zk5u1VMaWQa9w/769hwATkNv49GI/MLM2apJY2HaBvzPizWIrVpR89uilM+pxUaH51D94cnWjtVLaSt7BMJ1K/dy2hpEaBElmG0iWYsqpdpKTJkVCDOYxs8sumEFsvIUWcQkiuk5EKrxfAjqcUpf5yTvkhFtkiIU2oxf2sGXXVFGocM+dpzbFXlhmk76caeRD+tw9bNfDAbuy7JjEfVS7ls3gmUHu3298JZhfiR89YxBx7BDZ7Kr9vurdXaYihoCqkXykw8D7MiZGRcdMJbGmRGmsILho9KtlJJsP7BNG6W3uA/z5gRzlV3RJVjXigWDCpOUxp+TVNP9ug4ymmSf2g6cQ= |
@@ -0,0 +1,81 @@ | ||
# Archives Unleashed Project Code of Conduct | ||
|
||
## Our Pledge | ||
|
||
* The Archives Unleashed Project believes in supporting an open, inclusive, and | ||
diverse community which respects the experience, expertise, and knowledge of | ||
all community members. | ||
* The Archives Unleashed community is dedicated to providing a harassment-free | ||
experience for everyone, and welcomes individuals regardless age, body size, | ||
disability, ethnicity, gender identity and expression, level of experience, | ||
nationality, personal appearance, race, religion, or sexual identity and | ||
orientation. | ||
* To foster respectful collaborations this code of conduct applies to all | ||
Archives Unleashed spaces, includes, but is not limited to, GitHub, Slack, | ||
Medium, social media platforms and meeting spaces, both online and off. | ||
* Anyone who violates this code of conduct may be sanctioned or expelled from | ||
these spaces at the discretion of the Archives Unleashed Project Team. | ||
|
||
## Our Standards | ||
|
||
Examples of behavior that contributes to creating a positive environment | ||
include: | ||
|
||
* Using welcoming and inclusive language | ||
* Being respectful of differing viewpoints and experiences | ||
* Gracefully accepting constructive criticism | ||
* Focusing on what is best for the community | ||
* Showing empathy towards other community members | ||
|
||
Examples of unacceptable behavior by participants include: | ||
|
||
* The use of sexualized language or imagery and unwelcome sexual attention or | ||
advances | ||
* Trolling, insulting/derogatory comments, and personal or political attacks | ||
* Public or private harassment | ||
* Publishing others' private information, such as a physical or electronic | ||
address, without explicit permission | ||
* Other conduct which could reasonably be considered inappropriate in a | ||
professional setting | ||
|
||
## Our Responsibilities | ||
|
||
Project maintainers are responsible for clarifying the standards of acceptable | ||
behavior and are expected to take appropriate and fair corrective action in | ||
response to any instances of unacceptable behavior. | ||
|
||
Project maintainers have the right and responsibility to remove, edit, or | ||
reject comments, commits, code, wiki edits, issues, and other contributions | ||
that are not aligned to this Code of Conduct, or to ban temporarily or | ||
permanently any contributor for other behaviors that they deem inappropriate, | ||
threatening, offensive, or harmful. | ||
|
||
## Scope | ||
|
||
This Code of Conduct applies both within project spaces and in public spaces | ||
when an individual is representing the project or its community. Examples of | ||
representing a project or community include using an official project e-mail | ||
address, posting via an official social media account, or acting as an appointed | ||
representative at an online or offline event. Representation of a project may be | ||
further defined and clarified by project maintainers. | ||
|
||
## Enforcement | ||
|
||
Instances of abusive, harassing, or otherwise unacceptable behavior may be | ||
reported by contacting the project team at archivesunleashed@gmail.com. All | ||
complaints will be reviewed and investigated and will result in a response that | ||
is deemed necessary and appropriate to the circumstances. The project team is | ||
obligated to maintain confidentiality with regard to the reporter of an incident. | ||
Further details of specific enforcement policies may be posted separately. | ||
|
||
Project maintainers who do not follow or enforce the Code of Conduct in good | ||
faith may face temporary or permanent repercussions as determined by other | ||
members of the project's leadership. | ||
|
||
## Attribution | ||
|
||
This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4, | ||
available at [http://contributor-covenant.org/version/1/4][version] | ||
|
||
[homepage]: http://contributor-covenant.org | ||
[version]: http://contributor-covenant.org/version/1/4/ |
@@ -0,0 +1,53 @@ | ||
# Welcome! | ||
|
||
If you are reading this document then you are interested in contributing The Archives Unleashed Project. All contributions are welcome: use-cases, documentation, code, ptatches, bug reports, feature requests, etc. You do not need to be a programmer to speak up! | ||
|
||
### Use cases | ||
|
||
If you would like to submit a use case for The Archives Unleashed Toolkit, please submit and issue [here](https://github.com/archivesunleashed/twut/issues/new), and begin the issue title with "Use Case:". | ||
|
||
### Documentation | ||
|
||
You can contribute documentation in two different ways. One way is to create an issue [here](https://github.com/archivesunleashed/twut/issues/new) and begin the issue title with "Documentation:". | ||
|
||
### Request a new feature | ||
|
||
To request a new feature you should [open an issue](https://github.com/archivesunleashed/twut/issues/new) or create a use case as described above (see _use case_ section above), and summarize the desired functionality. Begin the issue title with "Enhancement:". | ||
|
||
### Report a bug | ||
|
||
To report a bug you should [open an issue](https://github.com/archivesunleashed/twut/issues/new) that summarizes the bug. Set the label to "bug". | ||
|
||
In order to help us understand and fix the bug it would be great if you could provide us with: | ||
|
||
1. The steps to reproduce the bug. This includes information about e.g. The Archives Unleashed Toolkit version you were using, whether on a single node or cluster, etc. | ||
2. The expected behavior. | ||
3. The actual, incorrect behavior. | ||
|
||
Feel free to search the issue queue for existing issues (aka tickets) that already describe the problem; if there is such a ticket please add your information as a comment. | ||
|
||
### Contribute code | ||
|
||
_If you are interested in contributing code to The Archives Unleashed Toolkit but do not know where to begin:_ | ||
|
||
In this case you should [browse open issues](https://github.com/archivesunleashed/twut/issues). | ||
|
||
Contributions to The Archives Unleased Toolkit codebase should be sent as GitHub pull requests. See section _Create a pull request_ below for details. If there is any problem with the pull request we can work through it using the commenting features of GitHub. | ||
|
||
* For _small patches_, feel free to submit pull requests directly for those patches. | ||
* For _larger code contributions_, please use the following process. The idea behind this process is to prevent any wasted work and catch design issues early on. | ||
|
||
1. [Open an issue](https://github.com/archivesunleashed/twut/issues), if a similar issue does not exist already. If a similar issue does exist, then you may consider participating in the work on the existing issue. | ||
2. Comment on the issue with your plan for implementing the issue. Explain what pieces of the codebase you are going to touch and how everything is going to fit together. | ||
3. The Archives Unleashed Toolkit committers will work with you on the design to make sure you are on the right track. | ||
4. Implement your issue, create a pull request (see below), and iterate from there. | ||
|
||
### Create a pull request | ||
|
||
Take a look at [Creating a pull request](https://help.github.com/articles/creating-a-pull-request). In a nutshell you need to: | ||
|
||
1. [Fork](https://help.github.com/articles/fork-a-repo) The Archives Unleashed Toolkit GitHub repository at [https://github.com/archivesunleashed/twut](https://github.com/archivesleashed/twut) to your personal GitHub account. | ||
2. Commit any changes to your fork. | ||
3. Send a [pull request](https://help.github.com/articles/creating-a-pull-request) to The Archives Unleashed Toolkit GitHub repository that you forked in step 1. If your pull request is related to an existing issue -- for instance, because you reported a [bug/issue](https://github.com/archivesunleashed/twut/issues) earlier -- prefix the title of your pull request with the corresponding issue number (e.g. `issue-123: ...`). Please also include a reference to the issue in the description of the pull. This can be done by using '#' plus the issue number like so '#123', also try to pick an appropriate name for the branch in which you're issuing the pull request from. | ||
|
||
You may want to read [Syncing a fork](https://help.github.com/articles/syncing-a-fork) for instructions on how to keep your fork up to date with the latest changes of the upstream (official) `twut` repository. |
@@ -1 +1,88 @@ | ||
# twut | ||
# twut | ||
|
||
[![Build Status](https://travis-ci.org/archivesunleashed/twut.svg?branch=master)](https://travis-ci.org/archivesunleashed/twut) | ||
[![codecov](https://codecov.io/gh/archivesunleashed/twut/branch/master/graph/badge.svg)](https://codecov.io/gh/archivesunleashed/twut) | ||
[![Maven Central](https://maven-badges.herokuapp.com/maven-central/io.archivesunleashed/twut/badge.svg)](https://maven-badges.herokuapp.com/maven-central/io.archivesunleashed/twut) | ||
[![Javadoc](https://javadoc-badge.appspot.com/io.archivesunleashed/twut.svg?label=javadoc)](http://api.docs.archivesunleashed.io/0.18.0/apidocs/index.html) | ||
[![Scaladoc](https://javadoc-badge.appspot.com/io.archivesunleashed/twut.svg?label=scaladoc)](http://api.docs.archivesunleashed.io/0.18.0/scaladocs/index.html) | ||
[![LICENSE](https://img.shields.io/badge/license-Apache-blue.svg?style=flat)](https://www.apache.org/licenses/LICENSE-2.0) | ||
[![Contribution Guidelines](http://img.shields.io/badge/CONTRIBUTING-Guidelines-blue.svg)](./CONTRIBUTING.md) | ||
|
||
An open-source toolkit for analyzing line-oriented JSON Twitter archives with Apache Spark. | ||
|
||
## Getting Started | ||
|
||
### Easy | ||
|
||
If you have Apache Spark ready to go, it's as easy as: | ||
|
||
``` | ||
$ spark-shell --packages "io.archivesunleashed:twut:0.0.1-SNAPSHOT" | ||
``` | ||
|
||
### A little less easy | ||
|
||
You can download the [latest release here](https://github.com/archivesunleashed/twut/releases) and include it like so: | ||
|
||
``` | ||
$ spark-shell --jars /path/to/twut-0.0.1-SNAPSHOT-fatjar.jar" | ||
``` | ||
|
||
## Usage | ||
|
||
`twut` expects Tweets to be supplied in a DataFrame. | ||
|
||
Example: | ||
|
||
``` | ||
Welcome to | ||
____ __ | ||
/ __/__ ___ _____/ /__ | ||
_\ \/ _ \/ _ `/ __/ '_/ | ||
/___/ .__/\_,_/_/ /_/\_\ version 3.0.0-preview | ||
/_/ | ||
Using Scala version 2.12.10 (OpenJDK 64-Bit Server VM, Java 1.8.0_222) | ||
Type in expressions to have them evaluated. | ||
Type :help for more information. | ||
scala> import io.archivesunleashed.twut._ | ||
import io.archivesunleashed.twut._ | ||
scala> val tweets = "/home/nruest/Projects/au/twut/src/test/resources/10-sample.jsonl" | ||
tweets: String = /home/nruest/Projects/au/twut/src/test/resources/10-sample.jsonl | ||
scala> val tweetsDF = spark.read.json(tweets) | ||
19/12/02 13:38:51 WARN package: Truncated the string representation of a plan since it was too large. This behavior can be adjusted by setting 'spark.sql.debug.maxToStringFields'. | ||
tweetsDF: org.apache.spark.sql.DataFrame = [contributors: string, coordinates: string ... 33 more fields] | ||
scala> twut.ids(tweetsDF).show | ||
+-------------------+ | ||
| id_str| | ||
+-------------------+ | ||
|1201505319257403392| | ||
|1201505319282565121| | ||
|1201505319257608197| | ||
|1201505319261655041| | ||
|1201505319261597696| | ||
|1201505319274332165| | ||
|1201505319261745152| | ||
|1201505319270146049| | ||
|1201505319286755328| | ||
|1201505319286984705| | ||
+-------------------+ | ||
``` | ||
|
||
## Documentation! Or, how do I use this? | ||
|
||
Once built or downloaded, you can follow the basic set of recipes and tutorials [here](https://github.com/archivesunleashed/twut/wiki/). | ||
|
||
# License | ||
|
||
Licensed under the [Apache License, Version 2.0](http://www.apache.org/licenses/LICENSE-2.0). | ||
|
||
# Acknowledgments | ||
|
||
This work is primarily supported by the [Andrew W. Mellon Foundation](https://mellon.org/). Other financial and in-kind support comes from the [Social Sciences and Humanities Research Council](http://www.sshrc-crsh.gc.ca/), [Compute Canada](https://www.computecanada.ca/), the [Ontario Ministry of Research, Innovation, and Science](https://www.ontario.ca/page/ministry-research-innovation-and-science), [York University Libraries](https://www.library.yorku.ca/web/), [Start Smart Labs](http://www.startsmartlabs.com/), and the [Faculty of Arts](https://uwaterloo.ca/arts/) and [David R. Cheriton School of Computer Science](https://cs.uwaterloo.ca/) at the [University of Waterloo](https://uwaterloo.ca/). | ||
|
||
Any opinions, findings, and conclusions or recommendations expressed are those of the researchers and do not necessarily reflect the views of the sponsors. |
@@ -0,0 +1,13 @@ | ||
Copyright © ${project.inceptionYear} ${owner} | ||
|
||
Licensed under the Apache License, Version 2.0 (the "License"); | ||
you may not use this file except in compliance with the License. | ||
You may obtain a copy of the License at | ||
|
||
http://www.apache.org/licenses/LICENSE-2.0 | ||
|
||
Unless required by applicable law or agreed to in writing, software | ||
distributed under the License is distributed on an "AS IS" BASIS, | ||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
See the License for the specific language governing permissions and | ||
limitations under the License. |
Oops, something went wrong.
0 comments on commit
674b8e9