@@ -712,7 +712,7 @@ Scala assigns the results to `r` is this case, which you can then subsequently m
If you want _all_ results, replace `.take(10)` with `.collect()`.
This will return _all_ results to the console.
**WARNING**: Be careful with `.collect()`! If your results contain ten million records, AUT will try to return _all of them_ to your console (on your physical machine).
**WARNING**: Be careful with `.collect()`! If your results contain ten million records, TWUT will try to return _all of them_ to your console (on your physical machine).
Most likely, your machine won't have enough memory!
Alternatively, if you want to save the results to disk, replace `.show(20, false)` with the following:
@@ -752,4 +752,66 @@ Note that this works even across languages (e.g., export to Parquet from Scala,
##Python
TODO: Python basically the same, but with Python syntax. However, we should be explicit and lay out the steps.
If you want to return a set of results, the counterpart of `.take(10)` with RDDs is `.head(10)`.
So, something like (in Python):
```python
SelectTweet.ids(df)
# more transformations here...
.head(10)
```
In the PySpark console, the results are returned as a List of rows, like the following:
You can assign the tranformations to a variable, like this:
```python
tweet_ids = SelectTweet.ids(df)
# more transformations here...
.head(10)
```
If you want _all_ results, replace `.head(10)` with `.collect()`.
This will return _all_ results to the console.
**WARNING**: Be careful with `.collect()`! If your results contain ten million records, TWUT will try to return _all of them_ to your console (on your physical machine).
Most likely, your machine won't have enough memory!
Alternatively, if you want to save the results to disk, replace `.show(20, false)` with the following:
```python
tweet_ids.write.csv("/path/to/export/directory/")
```
Replace `/path/to/export/directory/` with your desired location.
Note that this is a _directory_, not a _file_.
Depending on your intended use of the output, you may want to include headers in the CSV file, in which case:
0 comments on commit
16ded84