top of page

Top 10 mistakes cloudera certification and how to prepare for CCA175.

Cloudera CCA175 is not very difficult ,if you are prepared well enough.I am going to tell ,How you should prepare and what common mistakes people do and how to avoid it.

Top 10 Mistakes:

1-People copy wrong path from different problems opened in multiple tabs in web browser.So get comfortable with Mozilla Firefox.

2-People delete the data directory by mistake using hdfs dfs -rm -R , where they have to import data actually and copy the wrong path from different problem.So be very careful.Do not delete the path which actually store the data required for other problem.

3-Don't panic ,2 hours are sufficient for the exam ,I got 30 minutes left for revision even after my internet connection was getting reset in every 15 minutes.

4-Always set the compress configuration to uncompressed after solving the program else you may do a mistake and save the file in other format ,which is not required for the problem you are trying to solve.

5- Do not just use laptop ,use external monitor for bigger screen ,because cluster is not of high resolution and you will do mistake ,when typing because it's hard to differentiate between , and " when font size is small.

6- Get yourself comfortable with the environment and must watch the video (https://www.cloudera.com/more/training/certification/cca-spark.html) on cloudera certification page and make sure you know how to increase the font size because you will need that.

7-Make sure you know how to use sqoop eval command because some times, it's problem to connect to relation database,which is MySQL for cloudera so make sure you can validate your result and see structure using eval command.

8-Always copy the path when needed in the program to save the file in particular directory and make sure you check the output is in correct format in the directory specified in the problem and do not spend too much time on a problem if you are stuck ,you can solve that later after completing other problems.

9-Please get yourself comfortable in sublime text editor because that is the default text editor available on cluster and persist your code because sometimes ,same code can be reusable for other problems .

10-Listen others but apply your own mind and use your own tricks.

How to prepare:

It's not difficult to get cloudera certified.I believe it just needs 2 to 3 months of preparation time depending on how talented you are.It should take 3 months for average person to prepare for exam.

The best resource I have found is Durga Gadiraju's tutorial on udemy it's best for learning and preparing for cloudera certification.To practice the scenarios you can follow arun's blog (http://arun-teaches-u-tech.blogspot.in/).It's really great.I did the same thing but I created my own case studies to prepare for this exam.You can use my git-hub account ( https://github.com/tosarvesh/Bigdata_preparation ), where you can find my scripts and documents and scenarios I created to practice for the exam.It may not be well formatted but you will get great understanding of how to perform conversion ,use data frames etc,which would be very helpful and sufficient to crack the exam.

There is no specific language you have to use, to solve spark problems ,it can be solved using Scala,python ,what ever language you like.People ,who are comfortable in SQL they prefer to use dataframe and write SQL to solve the problem.So it's totally up to you how you solve it .Cloudera just check and verify the output.

Launch spark shell: You can launch spark-shell using spark-shell --master yarn command it should be sufficient to solve the problem because the data-set is not that huge.If you are comfortable with additional parameters like num-executors and executor-memory you can pass that too.My strategy would be just launch separate terminal with spark-shell --master yarn in case I get a problem to load a big file and perform transformation and I will let that program run because in case you over consume the resources it could be a problem and proctor won't be able to help you out.So make sure you understand the parameters and know very well how to use them.

Please make yourself comfortable in file conversion ,compression,loading of different file format.I am not sure there could be problems related to Apache Kafka ,flume and streaming because it's hard to validate but still you should have knowledge about it.

For avro you just need to import that dependency using import com.databricks.spark.avro._ .Everything else will be available on cluster ,no need to import anything else.

One last tip practice,practice and practice at least 3 to 4 times of that arun blog's problems and examples I have on my github account .you can create your own scenarios and permutation combination of file conversion.

If you have any question you can comment on the blog below or message me using my linked-in profile : https://www.linkedin.com/in/guptasarvesh/

You can buy udemy course and avail the discount using these coupons available on this url:

You can also signup for the lab access with good clusters using this url:

save the

bottom of page