Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions sdks/python/apache_beam/examples/wordcount_rust/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ This will compile the Rust code and build a Python package linked to it in the c
To execute wordcount locally using the direct runner, execute the following from the wordcount_rust directory within the same virtual environment:

```bash
python wordcount.py --runner DirectRunner --input * --output counts.txt
python wordcount_rust.py --runner DirectRunner --input * --output counts.txt

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using an unquoted wildcard * for the --input argument will cause the shell to expand it to all files in the current directory before passing them to Python. This will lead to argument parsing errors (e.g., unrecognized arguments). Quote the wildcard to ensure it is passed literally to the Beam pipeline.

Suggested change
python wordcount_rust.py --runner DirectRunner --input * --output counts.txt
python wordcount_rust.py --runner DirectRunner --input "*" --output counts.txt

```

To execute wordcount using the Dataflow runner, the tarball of the PyO3 Rust package must be provided to GCP. This is done by building the tarball then providing it as an `extra_package` argument. The tarball can be built using the following command from the wordcount_rust directory:
Expand All @@ -45,7 +45,7 @@ python -m build --sdist
This places the tarball in `./word_processing/dist` as `word_processing-0.1.0.tar.gz`. Job submission to Dataflow from the `wordcount_rust` directory then looks like the following:

```bash
python wordcount.py --runner DataflowRunner --input gs://apache-beam-samples/shakespeare/*.txt --output gs://<YOUR_BUCKET>/wordcount_rust/counts.txt --project <YOUR_PROJECT> --region <YOUR_REGION> --extra_package ./word_processing/dist/word_processing-0.1.0.tar.gz
python wordcount_rust.py --runner DataflowRunner --input gs://apache-beam-samples/shakespeare/*.txt --output gs://<YOUR_BUCKET>/wordcount_rust/counts.txt --project <YOUR_PROJECT> --region <YOUR_REGION> --extra_package ./word_processing/dist/word_processing-0.1.0.tar.gz
```

The job will then execute on Dataflow, installing the Rust package during worker setup. Wordcount will then execute and produce a counts.txt file in the specified output bucket.
Original file line number Diff line number Diff line change
Expand Up @@ -15,5 +15,5 @@
# limitations under the License.
#

build=1.3.0
build==1.3.0
maturin==1.11.2
Loading