You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have a simple pipeline consisting of only one component (CSVExampleGen) to ingest csv files and convert them to TFRecords.
However, upon running the pipeline I get the following warning: WARNING:apache_beam.io.tfrecordio:Couldn't find python-snappy so the implementation of _TFRecordUtil._masked_crc32c is not as fast as it could be.
I have already installed python-snappy and the corresponding C library using the following commands:
System information
Interactive Notebook, Google Cloud, etc):
pip freeze
output):absl-py==1.4.0 annotated-types==0.7.0 anyio==4.4.0 apache-beam==2.56.0 argon2-cffi==23.1.0 argon2-cffi-bindings==21.2.0 array_record==0.5.1 arrow==1.3.0 asttokens==2.4.1 astunparse==1.6.3 async-lru==2.0.4 async-timeout==4.0.3 attrs==23.2.0 Babel==2.15.0 backcall==0.2.0 beautifulsoup4==4.12.3 bleach==6.1.0 cachetools==5.3.3 certifi==2024.6.2 cffi==1.16.0 charset-normalizer==3.3.2 click==8.1.7 cloudpickle==2.2.1 colorama==0.4.6 comm==0.2.2 contourpy==1.2.1 cramjam==2.8.3 crcmod==1.7 cycler==0.12.1 debugpy==1.8.1 decorator==5.1.1 defusedxml==0.7.1 dill==0.3.1.1 dm-tree==0.1.8 dnspython==2.6.1 docker==4.4.4 docopt==0.6.2 docstring_parser==0.16 etils==1.7.0 exceptiongroup==1.2.1 executing==2.0.1 facets-overview==1.1.1 fastavro==1.9.4 fasteners==0.19 fastjsonschema==2.20.0 flatbuffers==24.3.25 fonttools==4.53.0 fqdn==1.5.1 fsspec==2024.6.0 gast==0.5.4 google-api-core==2.19.0 google-api-python-client==1.12.11 google-apitools==0.5.31 google-auth==2.30.0 google-auth-httplib2==0.2.0 google-auth-oauthlib==1.2.0 google-cloud-aiplatform==1.56.0 google-cloud-bigquery==3.24.0 google-cloud-bigquery-storage==2.25.0 google-cloud-bigtable==2.24.0 google-cloud-core==2.4.1 google-cloud-dataproc==5.9.3 google-cloud-datastore==2.19.0 google-cloud-dlp==3.18.0 google-cloud-language==2.13.3 google-cloud-pubsub==2.21.3 google-cloud-pubsublite==1.10.0 google-cloud-recommendations-ai==0.10.10 google-cloud-resource-manager==1.12.3 google-cloud-spanner==3.47.0 google-cloud-storage==2.17.0 google-cloud-videointelligence==2.13.3 google-cloud-vision==3.7.2 google-crc32c==1.5.0 google-pasta==0.2.0 google-resumable-media==2.7.1 googleapis-common-protos==1.63.1 grpc-google-iam-v1==0.13.0 grpc-interceptor==0.15.4 grpcio==1.64.1 grpcio-status==1.48.2 h11==0.14.0 h5py==3.11.0 hdfs==2.7.3 httpcore==1.0.5 httplib2==0.22.0 httpx==0.27.0 idna==3.7 immutabledict==4.2.0 importlib_resources==6.4.0 ipykernel==6.29.4 ipython==8.25.0 ipython-genutils==0.2.0 ipywidgets==8.1.3 isoduration==20.11.0 jedi==0.19.1 Jinja2==3.1.4 joblib==1.4.2 Js2Py==0.74 json5==0.9.25 jsonpickle==3.2.1 jsonpointer==3.0.0 jsonschema==4.22.0 jsonschema-specifications==2023.12.1 jupyter-events==0.10.0 jupyter-lsp==2.2.5 jupyter_client==8.2.0 jupyter_core==5.7.2 jupyter_server==2.14.1 jupyter_server_terminals==0.5.3 jupyterlab==4.2.2 jupyterlab_pygments==0.3.0 jupyterlab_server==2.27.2 jupyterlab_widgets==3.0.11 keras==2.15.0 keras-tuner==1.4.7 kiwisolver==1.4.5 kt-legacy==1.0.5 kubernetes==12.0.1 libclang==18.1.1 lxml==5.2.2 Markdown==3.6 MarkupSafe==2.1.5 matplotlib==3.9.0 matplotlib-inline==0.1.7 mistune==3.0.2 ml-dtypes==0.3.2 ml-metadata==1.15.0 ml-pipelines-sdk==1.15.1 mplcyberpunk==0.7.1 nbclient==0.10.0 nbconvert==7.16.4 nbformat==5.10.4 nest-asyncio==1.6.0 nltk==3.8.1 notebook==7.2.1 notebook_shim==0.2.4 numpy==1.26.4 oauth2client==4.1.3 oauthlib==3.2.2 objsize==0.7.0 opt-einsum==3.3.0 orjson==3.10.5 overrides==7.7.0 packaging==24.1 pandas==1.5.3 pandocfilters==1.5.1 parso==0.8.4 pathlib==1.0.1 pexpect==4.9.0 pickleshare==0.7.5 pillow==10.3.0 platformdirs==4.2.2 portalocker==2.8.2 portpicker==1.6.0 prometheus_client==0.20.0 promise==2.3 prompt_toolkit==3.0.47 proto-plus==1.23.0 protobuf==3.20.3 psutil==5.9.8 ptyprocess==0.7.0 pure-eval==0.2.2 pyarrow==10.0.1 pyarrow-hotfix==0.6 pyasn1==0.6.0 pyasn1_modules==0.4.0 pycparser==2.22 pydantic==2.7.4 pydantic_core==2.18.4 pydot==1.4.2 pyfarmhash==0.3.2 Pygments==2.18.0 pyjsparser==2.7.1 pymongo==4.7.3 pyparsing==3.1.2 python-dateutil==2.9.0.post0 python-json-logger==2.0.7 python-snappy==0.7.2 pytz==2024.1 PyYAML==6.0.1 pyzmq==26.0.3 redis==5.0.6 referencing==0.35.1 regex==2024.5.15 requests==2.32.3 requests-oauthlib==2.0.0 rfc3339-validator==0.1.4 rfc3986-validator==0.1.1 rouge_score==0.1.2 rpds-py==0.18.1 rsa==4.9 sacrebleu==2.4.2 scipy==1.12.0 seaborn==0.13.2 Send2Trash==1.8.3 shapely==2.0.4 simple_parsing==0.1.5 six==1.16.0 sniffio==1.3.1 soupsieve==2.5 sqlparse==0.5.0 stack-data==0.6.3 tabulate==0.9.0 tensorboard==2.15.2 tensorboard-data-server==0.7.2 tensorflow==2.15.1 tensorflow-data-validation==1.15.1 tensorflow-datasets==4.9.6 tensorflow-estimator==2.15.0 tensorflow-hub==0.15.0 tensorflow-io-gcs-filesystem==0.37.0 tensorflow-metadata==1.15.0 tensorflow-serving-api==2.15.1 tensorflow-transform==1.15.0 tensorflow_model_analysis==0.46.0 termcolor==2.4.0 terminado==0.18.1 tfx==1.15.1 tfx-bsl==1.15.1 timeloop==1.0.2 tinycss2==1.3.0 toml==0.10.2 tomli==2.0.1 tornado==6.4.1 tqdm==4.66.4 traitlets==5.14.3 types-python-dateutil==2.9.0.20240316 typing_extensions==4.12.2 tzlocal==5.2 uri-template==1.3.0 uritemplate==3.0.1 urllib3==2.2.2 wcwidth==0.2.13 webcolors==24.6.0 webencodings==0.5.1 websocket-client==1.8.0 Werkzeug==3.0.3 widgetsnbextension==4.0.11 wrapt==1.14.1 zipp==3.19.2 zstandard==0.22.0
Current Behavior
I have a simple pipeline consisting of only one component (CSVExampleGen) to ingest csv files and convert them to TFRecords.
However, upon running the pipeline I get the following warning:
WARNING:apache_beam.io.tfrecordio:Couldn't find python-snappy so the implementation of _TFRecordUtil._masked_crc32c is not as fast as it could be.
I have already installed python-snappy and the corresponding C library using the following commands:
Expected behavior: The execution of this simple pipeline should be much faster and no such warnings should be produced.
Standalone code to reproduce the issue
Download any moderately sized csv file with numerical data and run the following code:
The text was updated successfully, but these errors were encountered: