You are running a Hadoop cluster with a NameNode on host mynamenode. What are two ways to determine available HDFS space in your cluster?
Each node in your Hadoop cluster, running YARN, has 64GB memory and 24 cores. Your yarn.site.xml has the following configuration:
You want YARN to launch no more than 16 containers per node. What should you do?
Your cluster has the following characteristics:
Which describes the file read process when a client application connects into the cluster and requests a 50MB file?
What two processes must you do if you are running a Hadoop cluster with a single NameNode and six DataNodes, and you want to change a configuration parameter so that it affects all six DataNodes. (Choose two)
You’re upgrading a Hadoop cluster from HDFS and MapReduce version 1 (MRv1) to one running HDFS and MapReduce version 2 (MRv2) on YARN. You want to set and enforce version 1 (MRv1) to one running HDFS and MapReduce version 2 (MRv2) on YARN. You want to set and enforce a block size of 128MB for all new files written to the cluster after upgrade. What should you do?
Your cluster’s mapred-start.xml includes the following parameters
And any cluster’s yarn-site.xml includes the following parameters
What is the maximum amount of virtual memory allocated for each map task before YARN will kill its Container?
You want to understand more about how users browse your public website. For example, you want to know which pages they visit prior to placing an order. You have a server farm of 200 web servers hosting your website. Which is the most efficient process to gather these web server across logs into your Hadoop cluster analysis?
Which YARN daemon or service negotiations map and reduce Containers from the Scheduler, tracking their status and monitoring progress?