
Spark = ('demo').master("local").getOrCreate()ĭf = (".cassandra").options(table="testing123", keyspace="test"). Verify transfer has occurred by printing the number of rows in the dataframe.
Download spark cassandra connector install#
Pip install cassandra-driver Create a New NotebookĬreate SparkSession and load the dataframe from the Apache Cassandra table. Not the best solution but it will do to be able to use all these pieces together! These commands will launch Jupyter Notebooks on localhost:8888, the downside is if you have existing notebooks you won't be able to navigate to them. ('.cassandra').options(table='testing123', keyspace='test').load().show()Įxport PYSPARK_DRIVER_PYTHON_OPTS='notebook' #Create a dataframe from a table that we created above Test the connection out first - Using that keyspace and table we created above Note: Just working with PySpark in this case, and only DataFrames are available.
Download spark cassandra connector driver#
The connector utilized the DataStax Java driver under the hood to move data between Apache Cassandra and Apache Spark. This should be co-located with Apache Cassandra and Apache Spark on both on the same node.The connector will gather data from Apache Cassandra and its known token range and page that into the Spark Executor.

The Apache Cassandra and Apache Spark Connector works to move data back and forth from Apache Cassandra to Apache Spark to utilize the power for Apache Spark on the data. sbin/start-master.sh Information about the Apache Spark Connector At the time of writing, the following versions were used: Cassandra 3.10 Scala 2.11.8 Spark 2.1. There is a good version compatibility matrix on the GitHub wiki of the Spark-Cassandra connector. INSERT INTO testing123 (id, name, city) VALUES (2, 'Toby', 'NYC') Įxport SPARK_HOME=”//spark-x.x.x-bin-hadoopx.x Apache Spark The Spark-Cassandra connector Be careful about the various versions of frameworks and libraries. INSERT INTO testing123 (id, name, city) VALUES (1, 'Amanda', 'Bay Area') ĬREATE TABLE IF NOT EXISTS testing123 (id int, name text, city text, PRIMARY KEY (id)) We will use this keyspace and table later to validate the connection between Apache Cassandra and Apache Spark.

For more information about non default configurations review the the Apache Cassandra documentation. apache-cassandra-x.x.x/bin to your PATH but this is not required. apache-cassandra-x.x.x/bin/cassandra //This will start Cassandra Also, feel free to reach out and add comments on what worked for you! Hopefully, this works for you (as it did for me!), but if not use this as a guide. Note: With any set of install instructions it will not work in all cases.

Apache Cassandra - Apache Spark Connector.So you want to experiment with Apache Cassandra and Apache Spark to do some Machine Learning, awesome! But there is one downside, you need to create a cluster or ask to borrow someone else's to be able to do your experimentation… but what if I told you there is a way to install everything you need on one node, even on your laptop (if you are using Linux of Mac!).
