Classes discovered from deploying Cloudera Information Platform for IBM Cloud Pak for Information – IBM Developer

0
12


Right here on this final weblog put up in our collection, we deal with classes discovered from putting in, sustaining, and verifying the connectivity of Cloudera Information Platform and IBM Cloud Pak for Information. Should you haven’t learn the primary two posts — A technical deep-dive on integrating Cloudera Information Platform and IBM Cloud Pak for Information and Putting in Cloudera’s CDP Personal Cloud Base on IBM Cloud with Ansible, then I’d invite you to return and browse them for extra context.

On this installment, we’d prefer to share some helpful suggestions and methods and train you learn how to keep away from widespread errors by first-time installers

Lesson 1: Use a bastion host

Our Cloudera cluster had a complete of 8 VMs (3 grasp nodes, 3 employee nodes, and a pair of edge nodes). We wished easy accessibility to every node and wished to restrict public community site visitors to the Cloudera cluster as a lot as attainable. Fortunately, there’s already a well known answer to this downside: utilizing a bastion host.

We spun up a small VM on the identical subnet as our Cloudera cluster and will then simply talk over personal community interfaces (10.x.y.z IP addresses). For the set up course of, this selection supplied the good thing about not dropping connections for long-running Ansible playbooks.

alt_text
Determine 1. The structure of our Cloudera for Cloud Pak for Information surroundings

Lesson 2: Use VS Code’s Distant Extension Plug-in

When putting in Cloudera Information Platform with Ansible playbooks you’re possible going to wish to vary a couple of config choices and values within the playbooks. We’re not towards utilizing Vim, however we opted to make use of the Visible Studio Code Distant Improvement Extension Pack. This made looking via the information, modifying values, and importing and downloading information a lot simpler.

alt_text
Determine 2. VSCode’s Distant Improvement Extension helpful for modifying information and operating instructions towards our distant machines

Lesson 3: Stick to personal networks

This level could appear apparent, however it’s extra about being constant. Wherever an IP deal with was to be enter, we all the time made positive to make use of the personal community IP deal with. This ensured that any site visitors would keep on the IBM Cloud community and never the general public Web.

Lesson 4: Remove all inbound site visitors besides RDP on the Home windows Lively Listing server

Here’s a refined lesson which may in any other case be little difficult to pin down. After a couple of days of uptime, the well being checks on our Cloudera Information Platform had been indicating that the hosts couldn’t attain our Lively Listing (AD) server. Certainly we found that our AD had hung. Once we would reboot the AD server issues would return to regular for a day or so after which it will repeat.

We appeared over capability and efficiency of the server. Once we checked out networking utilization, we seen a excessive stage of site visitors going to and from the system from the Web going through interface. After wanting on the server configuration and the site visitors, we had been in a position to decide {that a} overwhelming majority was over the LDAP port.

Since our solely use of LDAP is inside, the answer to this downside was to restrict the inbound site visitors to the AD by making a rule that solely allowed site visitors on the RDP protocol, which is used for distant desktop administration. On IBM Cloud, we created a customized safety group allowing inbound TCP on port 3389 for RDP.

Lesson 5: Mount secondary drives to /information/dfs routinely

The storage necessities for putting in Cloudera required us to buy further drives to associate with our digital machines. These drives needed to be mounted earlier than operating any playbooks. We used a bit little bit of bash and SSH to do it in an automatic manner. In our case, we selected to mount the drives to /information/dfs:

for i in {1..8}
do
  ssh cid-vm-0$i mkfs.ext4 -m0 -O sparse_super,dir_index,extent,has_journal /dev/xvdc
  ssh cid-vm-0$i mkdir -p /information/dfs
  ssh cid-vm-0$i mount /dev/xvdc /information/dfs
  ssh cid-vm-0$i 'echo "/dev/xvdc  /information/dfs   ext4  defaults,noatime 1 2" | tee -a /and so forth/fstab'
executed

Lesson 6: Replace OpenShift DNS operator so it is aware of the Cloudera node hostnames

We wished our IBM Cloud Pak for Information occasion which runs on OpenShift be capable of talk with our newly deployed Cloudera Information Platform cluster. We caught to our “all the time use personal community interfaces” rule, however that resulted in 404s since OpenShift didn’t know learn how to resolve these hostnames. To get round this, we would have liked to edit the DNS operator on our OpenShift occasion. It’s documented within the OpenShift DNS Documentation, however for brevity, we’ve added what labored for us.

Edit the dns operator default CR: oc edit dns.operator/default replace by including to the spec part:

spec:
  servers:
  - forwardPlugin:
      upstreams:
      - <your personal ip>
      - <your public ip>
    title: cdplab-server
    zones:
    - cdplab.native

Then confirm the configmap for CoreDNS is up to date: oc get configmap/dns-default -n openshift-dns -o yaml

apiVersion: v1
information:
  Corefile: |
    # cdplab-server
    cdplab.native:5353 {
        ahead . <your personal ip> <your public ip>
    }

Then create a pod and attempt to entry CDP from the pod, and HTML needs to be returned, not a 404 error message.

bash-4.4$ curl -k https://cid-vm-01.cdplab.native:7183/cmf/residence

Lesson 7: Make sure the AD self-signed certificates can be utilized as a certificates authority

This lesson will be broadly utilized to different LDAP and AD situations. In our case, we may efficiently hook up with the Impala service operating on Cloudera via Kerberos, however not via LDAP. After double-checking that our LDAP-specific Impala configuration was appropriate, we had been nonetheless getting a not-so-helpful “Can’t contact LDAP server” error.

We slowly began to peel again the layers of the issue. We managed to isolate the issue to our LDAP configuration, and we realized this was the case as a result of once we ran ldapsearch in an try to bind the person, it gave us the identical error message. Ah-ha! Impala was utilizing an OpenLDAP library below the covers.

$ ldapsearch -H ldaps://cid-adc.cdplab.native:636 -D "stevemar@CDPLAB.LOCAL" -b "dc=cdplab,dc=native" '(uid=stevemar)' -W
Enter LDAP Password:
ldap_sasl_bind(SIMPLE): Cannot contact LDAP server (-1)

After double-checking that the Home windows firewall wasn’t the wrongdoer, we narrowed down the issue to a lacking bit of knowledge within the self-signed certificates we had created for the AD. We wanted so as to add the -TextExtension "2.5.29.19={textual content}CA=true" flag for the Home windows New-SelfSignedCertificate command. Our new command appeared like (earlier than it was lacking the final parameter):

New-SelfSignedCertificate -Topic *.$dnsName `
  -NotAfter $lifetime.AddDays(365) -KeyUsage DigitalSignature, KeyEncipherment `
  -Sort SSLServerAuthentication -DnsName *.$dnsName, $dnsName `
  -TextExtension "2.5.29.19={textual content}CA=true"

There’s no actual single piece of recommendation right here, aside from for those who’re going to make use of Kerberos to safe your Cloudera cluster, get accustomed to Kerberos ideas, like keytabs, and instruments like ktutil and ktpass.

Abstract and subsequent steps

We hope you loved studying about a number of the pitfalls we encountered and keep in mind a number of the suggestions we shared the subsequent time you’re deploying an information and AI platform. You’ll be able to be taught extra in regards to the Cloudera Information Platform for IBM Cloud Pak for Information joint providing.

LEAVE A REPLY

Please enter your comment!
Please enter your name here