

Or you can exit this terminal and create another. bashrc file in terminal again by running the following command. We will add spark variables below it later. So run the following command in the terminal, vim ~/.bashrcįile opens. bashrc file is loaded to the terminal every time it’s opened. Now some versions of ubuntu do not run the /etc/environment file every time we open the terminal so it’s better to add it in. The output should be: /usr/lib/jvm/java-8-openjdk-amd64

Later, in the terminal run source /etc/environmentĭon’t forget to run the last line in the terminal, as that will create the environment variable and load it in the currently running shell. Then, in a new line after the PATH variable add JAVA_HOME="/usr/lib/jvm/java-8-openjdk-amd64" Set the $JAVA_HOME environment variable.įor this, run the following in the terminal: sudo vim /etc/environment I got it in my default downloads folder where I will install spark.ģ. Remember the directory where you downloaded. If you don’t, run the following command in terminal: sudo apt install openjdk-8-jdkĪfter installation, if you type java -version in the terminal you will get: openjdk version "1.8.0_212" OpenJDK Runtime Environment (build 1.8.0_212-8u212-b03-0ubuntu1.18.04.1-b03) OpenJDK 64-Bit Server VM (build 25.212-b03, mixed mode) Make sure that you have java installed.If you follow the steps, you should be able to install PySpark without any problem.

My machine has ubuntu 18.04 and I am using java 8 along with anaconda3. So this is just a small effort of mine to put everything together. I went through a lot of medium articles and StackOverflow answers but not one particular answer or post did solve my problems. Source: Basic set-up for distributed machine learningĪfter a struggle for a few hours, I finally installed java 8, spark and configured all the environment variables.
