-
Notifications
You must be signed in to change notification settings - Fork 3.1k
Security Troubleshooting Guide
This document details common issues with setting up SSL and Kerberos or LDAP. Follow the steps below to try to diagnose your issue. The best way to debug issues is to be able to change configuration to verify what parts are correct. However, if you have read-only access to a cluster, or cannot restart Presto, follow as many of the steps below as you can. If you cannot figure out the issue with the guide, please include the following when asking for assistance on issues related to Kerberos/LDAP or SSL:
- The Presto configuration files
- The output from all of the steps in this guide (anonymized if necessary)
- The CLI script being used to connect to Presto or the JDBC connection string
- Any stack traces that are available
To collect the Presto config files, give the output from the following command (or give all of the configs in the etc directory):
./presto-admin configuration show
If possible, please also give the output from the following command, to allow debugging of network interfaces:
ifconfig
First, ensure that the cluster has the config that you think it has, and enable debug logging.
- Enable debug logging on the Presto server. Create ~/.prestoadmin/coordinator/log.properties with following line:
io.prestosql.server.security=DEBUG
- Re-deploy the Presto configuration to ensure that all of the config properties in ~/.prestoadmin are in effect:
./presto-admin configuration deploy
./presto-admin server restart
- Make sure that the Presto server is properly started by running the following command:
./presto-admin server status
If Presto fails to start, check the /var/log/presto/server.log
file. Record the stacktrace at the end of it.
Alternatively, without presto-admin, add the above line to etc/log.properties
on all Presto nodes.
If you see the following error with the JDBC driver, you probably have an issue with SSL:
[Simba][Presto](100073) Error fetching JSON content: No content to map due to end-of-input at [Source: ; line: 1, column: 1]
If you get the following exception, you probably have a problem with SSL:
java.net.ssl.SSLHandshakeException: General SSLEngine problem
If it fails with a 404 “cannot connect” error, you have an SSL issue (or maybe the Presto server isn't started).
If you get the following exception, you have an issue either with SSL or Kerberos/LDAP setup:
java.nio.channels.ClosedChannelException
Comment out the Kerberos or LDAP config properties and see if it is possible to connect to Presto via HTTPS but not Kerberos (or LDAP). If you are still unable to connect to the Presto server, verify the following:
- Your keystore file is the same for the CLI/driver and the Presto server
- The keystore file is readable by the presto user
- If you are connecting via IP address, that your keystore allows for it. Java is particularly strict with SSL security, so you need the “-ext san-ip:” parameter for the SSL certificate, in addition to making the IP address the Common Name:
keytool -genkeypair
-alias presto
-keyalg RSA
-keystore /etc/presto/keystore.jks
-keypass password
-storepass password
-dname "CN=, OU=, O=, L=, S=, C=" -ext san=ip: - If connecting via the Simba JDBC driver, make sure that the Presto remote service name is HTTP (specified by http.server.authentication.krb5.service-name in config.properties)
- Make sure that the CN shown in the following command matches the hostname you're connecting to on the CLI or via a driver: keytool -list -v -keystore keystore.jks NOTE: If you would like to be able to connect either via hostname or via IP, you will need two aliases: one with CN= and -ext san=ip: and another with CN= and -ext san=ip:, both in the same keystore file. If unsuccessful, collect a verification of the above steps and include it with the information specified in the top section when asking for further assistance.
First, see the documentation for Kerberos: https://prestosql.io/docs/current/security/server.html, https://prestosql.io/docs/current/security/cli.html, and https://prestosql.io/docs/current/connector/hive-security.html#kerberos-support. If that doesn't help, proceed step-by-step through verifying that SSL, frontend Kerberos and Kerberos for the Hive connector are all functioning properly.
If you have an error like the following, or any other error, you probably have a Kerberos issue:
Error starting query at https://severname.company.com:7778/v1/statement returned an invalid response: JsonResponse{statusCode=401, statusMessage=Unauthorized, headers={Content-Length=[0], Date=[Thu, 30 Jun 2016 21:47:01 GMT], WWW-Authenticate=[Negotiate realm="presto"]}, hasValue=false, value=null}
If you are able to connect with just SSL, the issue lies with Kerberos. Make sure the Kerberos configuration is correct on all Presto and worker nodes. For an explanation of the properties, refer to https://prestosql.io/docs/current/security/server.html, https://prestosql.io/docs/current/security/cli.html, and https://prestosql.io/docs/current/connector/hive-security.html#kerberos-support.
Next, try querying the system connector in Presto via the Presto CLI, to determine whether the issue is with the Presto client/Presto server Kerberos or the Hive Kerberos.
First, create a CLI script to connect to your Kerberized Presto server (a sample CLI script can be found here: https://prestosql.io/docs/current/security/cli.html#presto-cli-execution).
Then run the following query on that CLI:
select * from system.runtime.nodes;
If that query works, you have a problem with how Kerberos is configured for the Hive connector. See the below General Kerberos Debugging section and then skip ahead to the Hive Kerberos Debugging section. If tpch does not work, you have a problem with how Kerberos is configured for the frontend (and possibly also hidden problems with Hive Kerberos setup). See the below General Kerberos Debugging section and proceed through Frontend Kerberos Debugging.
- Ensure that all keytabs and keystore files are readable by the presto user (you may need to chown and/or chmod it).
- Ensure that all keytabs and keystore files are on all of the nodes of the cluster in the location specified by the configs, and that you can run
kinit -kt
as the proper principal with the keytab files. For example, if the CLI specifies the following:--krb5-principal someuser@EXAMPLE.COM --krb5-keytab-path /home/someuser/someuser.keytab
You can verify this with the following set of commands:
$ kinit -kt /home/someuser/someuser.keytab someuser@EXAMPLE.COM
$ klist
Ticket cache: FILE:/tmp/krb5cc_0
Default principal: someuser@EXAMPLE
Valid starting Expires Service principal
04/05/17 08:04:55 04/06/17 08:04:54 krbtgt/EXAMPLE@EXAMPLE
renew until 04/05/17 08:04:55
- Turn on Kerberos debugging by adding the following configs to Presto’s jvm.config (and restart Presto) and to the CLI command you’re running (see https://prestosql.io/docs/current/security/server.html#troubleshooting):
-Dsun.security.krb5.debug=true
-Dlog.enable-console=true
4 You can connect to the KDC from the Presto coordinator using telnet
telnet kdc.example.com 88
- Ensure that the
/etc/krb5.conf
file is the same on all of the nodes, and that it contains the same realm that you are using for the principals. - You have either installed the Java Cryptography Extension Policy Files (https://prestosql.io/docs/current/security/server.html#java-cryptography-extension-policy-files, where is substituted with the version of Presto being used) or configured your
/etc/krb5.conf
file to use less secure keys. - Kerberos relies heavily on DNS resolution, so make sure that a) DNS is set up correctly for your cluster or b) your /etc/hosts file is configured properly, in the form:
<ip address> <fqdn> <optional alias>
- Ensure all hostnames contain only lower case characters, as upper case characters are not permitted.
- Kerberos is highly dependent on your clocks being synchronized -- make sure that the clocks on all of the Presto servers and on the KDC are the same. We recommend enabling NTP. If the clocks are not in sync, you may get a failure like the following (visible when Kerberos debugging is turned on):
>>>>KRBError:
cTime is Fri Nov 14 10:31:21 EST 2008 1226655081000
sTime is Mon Jan 23 15:12:12 EST 2017 1484921532000
suSec is 360002
error code is 25
- Ensure that the Presto server keytab contains
<service principal>/<fqdn of presto coordinator>
, where service principal is the same ashttp.server.authentication.krb5.service-name
– if you're trying to connect via JDBC, make sure thathttp.server.authentication.krb5.service-name
isHTTP
. Make sure thathttp.server.authentication.krb5.service-name
matches the “--krb5-remote-service-name” option on the CLI, if connecting via the CLI. - Make sure your property names are correct: e.g. the keytab property name is
http.server.authentication.krb5.keytab
and the service name ishttp.server.authentication.krb5.service-name
. - Make sure that you are using the correct keytab file to match the latest service principal name. The proper keytab file for a service principal may change if you generate a keytab file with a random key.
So, to do this,
kinit
as the client user (in this case, prestoclient) and then trykvno
with the Presto service principal (in this case,HTTP/presto-master-node
). This checks if you can get a service ticket. The commands will look something like the following:
presto-master-node:/etc/presto # kinit -kt /etc/security/keytabs/prestoclient.keytab prestoclient
presto-master-node:/etc/presto # kvno HTTP/presto-master-node
HTTP/presto-master-node@REALM: kvno = 5
presto-master-node:/etc/presto # klist -kt /etc/security/keytabs/HTTP.keytab
Keytab name: FILE:/etc/security/keytabs/HTTP.keytab
KVNO Timestamp Principal
---- -----------------
4 06/24/16 19:51:20 HTTP/presto-master-node@REALM
4 06/24/16 19:51:20 HTTP/presto-master-node@REALM
4 06/24/16 19:51:20 HTTP/presto-master-node@REALM
4 06/24/16 19:51:20 HTTP/presto-master-node@REALM
presto-master-node:/etc/presto # klist -kt /etc/security/keytabs/http.keytab
Keytab name: FILE:/etc/security/keytabs/http.keytab
KVNO Timestamp Principal
---- -----------------
5 06/30/16 16:29:53 HTTP/presto-master-node@REALM
5 06/30/16 16:29:53 HTTP/presto-master-node@REALM
5 06/30/16 16:29:53 HTTP/presto-master-node@REALM
5 06/30/16 16:29:53 HTTP/presto-master-node@REALM
You should use the second keytab file (http.keytab) in config.properties for the server, because the kvno is 5, matching the output from the kvno command.
Ensure that each hive-related keytab specifies the same principal as in the config file.
e.g. hive.metastore.presto.principal
should match hive.metastore.presto.keytab
,
hive.hdfs.presto.principal
should match hive.hdfs.presto.keytab
, etc..
Then, verify that the keytab file can successfully be used to obtain a ticket for that principal (see the General Kerberos Debugging
section on how to do that).
In Presto versions > 157-t
, it is possible to use the _HOST
keyword in the config file -- e.g., hive.metastore.presto.principal=hive/_HOST@REALM
In Presto versions <= 157-t
, it's necessary to specify a different config file on each node, if your principals contain hostnames, each with the proper host for the principal (e.g. hive.metastore.presto.principal=hive/node-1@REALM on node-1
, hive.metastore.presto.principal=hive/node-2@REALM
on node-2)
First, see the documentation for LDAP: https://prestosql.io/docs/current/security/ldap.html. If that doesn't help, proceed step-by-step through verifying that SSL (See “Troubleshooting SSL section), LDAP are all functioning properly.
If you are able to connect with just SSL, the issue lies with LDAP. Make sure the LDAP configuration is correct on the coordinator node. For an explanation of the properties, refer to https://prestosql.io/docs/current/security/ldap.html. Common errors
- If Presto server refuses to start with error: Connection refused/timeout. Solution: Make sure the LDAP Server is reachable (value set for authentication.ldap.url in config.properties)
- If Presto server refuses to start with error: LDAP without SSL/TLS unsupported. Expected ldaps:// Solution: authentication.ldap.url property must start with ‘ldaps://’
- If Presto server refuses to start with error: “Anonymous bind failed” or “Unable to find valid certification path to requested target”
Solution: Happens if LDAP Server’s SSL certificate is not/incorrectly imported to the coordinator’s default Java truststore. Try re-importing the certificate to the truststore on coordinator (assumes Java is at /usr/java/default/) :
/usr/java/default/jre/bin/keytool -import -alias ldapcertificate -storepass changeit -keystore /usr/java/default/jre/lib/security/cacerts -noprompt -trustcacerts -file <ldap-certificate.pem>
- If you see the following error with CLI/driver, then either the username or password is incorrect.
Error starting query at https://server.company.com:8443/v1/statement returned HTTP response code 401.
Response info:
JsonResponse{statusCode=401, statusMessage=Invalid credentials: [LDAP: error code 49 - Invalid Credentials], headers={Cache-Control=[must-revalidate,no-cache,no-store], Content-Length=[354], Date=[Thu, 06 Apr 2017 21:59:14 GMT], WWW-Authenticate=[Basic realm="presto"], Content-Type=[text/html;charset=ISO-8859-1]}, hasValue=false, value=null}
This can also happen, if the authentication.ldap.user-bind-pattern property is incorrect. For example, if the LDAP user object looks like the following in the Active Directory:
# Authorized User, Asia, presto.testldap.com **
dn: CN=Authorized User,OU=Asia,DC=presto,DC=testldap,DC=com
objectClass: top
objectClass: person
objectClass: organizationalPerson
objectClass: user
cn: Authorized User
sn: User
givenName: Authorized
distinguishedName: CN=Authorized User,OU=Asia,DC=presto,DC=testldap,DC=com
memberOf: CN=AuthorizedGroup,OU=America,DC=presto,DC=testldap,DC=com
name: Authorized User
sAMAccountName: authorizeduser
** Please note that the LDAP objects terminology/tree structure will be different for other LDAP server implementations like OpenLDAP The authentication.ldap.user-bind-pattern must be set to ${USER}@presto.testldap.com and the --user connecting from CLI/Driver must be authorizeduser.
- If you see the following error with CLI/driver, then it may mean that the LDAP group authorization failed.
Error starting query at https://server.company.com:8443/v1/statement returned HTTP response code 401.
Response info:
JsonResponse{statusCode=401, statusMessage=Unauthorized user: User <user> not a member of the authorized group, headers={Cache-Control=[must-revalidate,no-cache,no-store], Content-Length=[376], Date=[Thu, 06 Apr 2017 22:01:44 GMT], WWW-Authenticate=[Basic realm="presto"], Content-Type=[text/html;charset=ISO-8859-1]}, hasValue=false, value=null}
For example, if there is an Active Directory group object AuthorizedGroup which has to be authorized to Presto, then the --user AuthorizedUser connecting via CLI/Driver must be a member of this group.
# AuthorizedGroup, Asia, presto.testldap.com
dn: CN=AuthorizedGroup,OU=America,DC=presto,DC=testldap,DC=com
objectClass: top
objectClass: group
cn: AuthorizedGroup
member: CN=AuthorizedUser,OU=Asia,DC=presto,DC=testldap,DC=com
distinguishedName: CN=AuthorizedGroup,OU=America,DC=presto,DC=testldap,DC=com
instanceType: 4
sAMAccountName: AuthorizedGroup
For the AuthorizedGroup to be authorized, authentication.ldap.group-auth-pattern must be set to (&(objectClass=person)(samAccountName=${USER})(memberof=CN=AuthorizedGroup,OU=America,DC=presto,DC=testldap,DC=com))
authentication.ldap.user-base-dn must be set to the base distinguished name of the AuthorizedUser which is OU=Asia,DC=presto,DC=testldap,DC=com (or DC=presto,DC=testldap,DC=com or even DC=testldap,DC=com etc.)
LDAPSEARCH is an utility tool which can be used to debug LDAP related issues. You can install this using ldap-utils or openldap-clients (yum install openldap-clients) package. If you have access to the LDAP server, you can use this tool to query the LDAP server from the Presto coordinator. You can view the objects (users, groups etc) in the LDAP server using this utility. This can be used to verify if the server properties (in config.properties) are valid.
For example, to return all user objects in OrgUnit=Asia:
ldapsearch -H ldap://<ldap_server_ip> -x -b "OU=Asia,DC=presto,DC=testldap,DC=com" “(objectClass=person)”
Another example where you can verify if the LDAP group authorization passed (to check if authentication.ldap.group-auth-pattern and authentication.ldap.user-base-dn are valid): ldapsearch -H ldap://<ldap_server_ip> -x -b "OU=Asia,DC=presto,DC=testldap,DC=com" "(&(objectClass=person)(samAccountName=AuthorizedUser)(memberof=CN=AuthorizedGroup,OU=America,DC=presto,DC=testldap,DC=com))" The above query returns an entry if the ONLY if the user AuthorizedUser (with user-base-dn OU=Asia,DC=presto,DC=testldap,DC=com) is a member of the group AuthorizedGroup