Ticket: https://hysds-core.atlassian.net/browse/HC-259

Background information required to understanding the thought process and planning behind integrating A&A to HySDS

OpenID Connect 1.0

… a simple identity layer on top of the OAuth 2.0 protocol. It allows Clients to verify the identity of the End-User based on the authentication performed by an Authorization Server, as well as to obtain basic profile information about the End-User in an interoperable and REST-like manner.

It creates a set of standards which allows for SSO in a secure manner

OpenID Connect can ensure that a user can sign into one application and be authorized to use all apps that are protected as well
OpenID Connect provides multiple types of “flows” to authorize users:
- Implicit flow - a simplified OAuth flow previously recommended for native apps and JavaScript apps where the access token was returned immediately without an extra authorization code exchange step
  - hysds_ui will use an implicit flow because it is a pure javascript (front-end) application
- Authorization flow to redirect users to login to a identity provider (Google, Facebook, etc.)
  - returns an authorization code, which in turn will be used by the your app’s backend to fetch an id_token, access_token and refresh_token to the client
- there are more but Authorization flow and Implicit flow are the most common

The access_token & refresh_token will be used to grant access to clients

JWT tokens
- JWT tokens allow systems to encode a JSON object into the token itself, which can be decoded by the application to retrieve user information (email, username, roles, etc.)
- JWT tokens are separated into 3 sections:
  - Header - info on encode algorithm
  - Payload - user info (email, username, roles, etc.)
  - Signature - created by taking the encoded header, the encoded payload, a secret, the algorithm specified in the header, and signing it
    - it will ensure if the token has been tampered with (and ultimately reject the token)

JWT tokens are the best options for the “micro-service” architecture (multiple REST APIs) of HySDS because:

one set of tokens (access_token, refresh_token) can be re-used by multiple applications
tokens can be decoded and the payload can be read to ensure role-based access to REST API endpoints
According to OpenID Connect standards, user info can be retrieved in 2 ways:
- online method, making a request to the provider’s UserInfo endpoint
  - curl https://[provider endpoint]/.../protocol/openid-connect/userinfo
  - reliable but can increase latency of having to make additional requests to retrieving user info every time your service is being called (not scalable in the long run)
- decoding JWT token
  - the user info is encoded in the token and can be retrieved without having to make additional requests (more scalable)
  - requirements:
    - if using the HS256 algorithm, tokens can be encoded and decoded with the secret_key
      - not secure. if the secret_key gets leaked users can create their own JWT tokens and can potentially have “superuser” access to your system
    - if using the RS256 algorithm (more secure):
      - tokens will be encoded with a private_key
      - tokens will be decoded with a public_key
      - import json from authlib.jose import jwt public_key = """ -----BEGIN PUBLIC KEY----- MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEAi2W0DkV... -----END PUBLIC KEY----- """ token = 'eyJhbGciOiJSUzI1NiIsInR5cCIgOiAiSldUIiwia2lkIiA6ICJ0czA0c...' claims = jwt.decode(token, public_key) claims.validate() # will raise error if token expired

Example of JWT token:

According to the OpenID Connect documentation when your access_token expires you can use the refresh_token to retrieve a new token (add client_secret for if client type is confidential)

ex (with Keycloak):

curl -s -X POST \
  -d client_id=<client_id> \
  -d client_secret=<client_secret> \
  -d grant_type=refresh_token \
  -d refresh_token=<refresh_token> \
  "http://localhost:8080/auth/realms/<realm>/protocol/openid-connect/token" | python -m json.tool

response:

{
  "access_token": "eyJhbGciOiJSUzI1NiIsIn...",
  "expires_in": 300,
  "refresh_expires_in": 1800,
  "refresh_token": "eyJhbGciOiJIUzI1NiIsInR5cCI...",
  "token_type": "Bearer",
  "not-before-policy": 0,
  "session_state": "183ebafb-93ed-408f-a2ea-3708f518a694",
  "scope": "profile"
}

There are multiple SSO providers that use OpenID Connect for A&A:

SSO Providers:

Keycloak

Originally the plan was to use Keycloak for A&A:

Pros:

Able to handle separate role level access with the use of JWT Tokens
Open-source (supported by RedHat)
LDAP integration
- sync/import users from LDAP group to internal database

Cons:

Requires a SQL database (MySQL, PostgreSQL, etc.)

Keycloak guide from Red Hat on how to set up realms, client apps and client roles

uses Java’s springboot framework in the rest API integration but can be followed

OCIO advised against using Keycloak, instead suggesting AWS Cognito

[meeting] with OCIO where 4 other projects are also working on Jupyter notebooks front-end to PCMs. The topic was raise for FN and public access to be able to sign into ADE+PCM for on-demand use. As a heads up, OCIO is recommending to not use Keycloak and instead use AWS Cognito with some additional ELB proxies

AWS Cognito

According to this StackOverflow post:

Cognito exposes an OpenID Connect Discovery endpoint as described at https://openid.net/specs/openid-connect-discovery-1_0.html#ProviderConfigurationRequest at the following location:
https://cognito-idp.{region}.amazonaws.com/{userPoolId}/.well-known/openid-configuration

Because Cognito can be exposed as an OpenID Connect provider, a lot of what we have researched on OpenID Connect (specifically Keycloak) can be applied in Cognito as well

but will still need to do additional research on implementation

Cognito doesn’t have clear instructions how to sync your LDAP directory so will need to do further research. Related links:

JWT Tokens

Because AWS Cognito supports OpenID Connect, they supply users with a id_token, refresh_token and a access_token

example of a access_token payload:

{
  "sub": "aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee",
  "device_key": "aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee",
  "cognito:groups": [
    "admin"
  ],
  "token_use": "access",
  "scope": "aws.cognito.signin.user.admin",
  "auth_time": 1562190524,
  "iss": "https://cognito-idp.us-west-2.amazonaws.com/us-west-2_example",
  "exp": 1562194124,
  "iat": 1562190524,
  "jti": "aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee",
  "client_id": "57cbishk4j24pabc1234567890",
  "username": "janedoe@example.com"
}

Subject (sub)

The sub claim is a unique identifier (UUID) for the authenticated user. It is not the same as the user name, which may not be unique.

Amazon Cognito groups (cognito:groups)

The cognito:groups claim is a list of groups the user belongs to (can be treated the same as roles)

Authentication time (auth_time)

The auth_time claim contains the time when the authentication occurred. Its value is a JSON number that represents the number of seconds from 1970-01-01T0:0:0Z as measured in UTC format. On refreshes, it represents the time when the original authentication occurred, not the time when the token was issued.

Issuer (iss)

The iss claim has the following format: https://cognito-idp.{region}.amazonaws.com/{userPoolId}

In the case (otello, mozart + grq2 REST APIs) where a user would need to directly get a set of tokens directly (with username + password) we can leverage boto3 to obtain it (as demonstrated in this StackOverflow post):

def authenticate_and_get_token(username: str, password: str, 
                               user_pool_id: str, app_client_id: str) -> None:
    client = boto3.client('cognito-idp')

    resp = client.admin_initiate_auth(
        UserPoolId=user_pool_id,
        ClientId=app_client_id,
        AuthFlow='ADMIN_NO_SRP_AUTH',
        AuthParameters={
            "USERNAME": username,
            "PASSWORD": password
        }
    )

    print("Log in success")
    print("Access token:", resp['AuthenticationResult']['AccessToken'])
    print("ID token:", resp['AuthenticationResult']['IdToken'])

ElasticSearch

Authenticating ElasticSearch directly would require a major update in the HySDS core (hysds_commons, hysds) to fetch an access_token for every background process & celery worker

Configure ElasticSearch for OpenID Connect authentication

An alternative is to authenticate at the proxy (apache or nginx) level:

This is a work in progress as a lot of research still needs to be done
only authenticate for ElasticSearch requests coming from outside the server (hysds_ui, etc)
internal processes can hit ES directly without having to fetch an access_token beforehand
Apache’s OpenID Connect library
NGINX OpenID Connect Implementation
- uses OpenResty so it’ll require additional setup
- NGINX Plus supports OpenID but it’s not free
current research documented in repo:GitHub - DustinKLo/nginx-openid-demo: Proxy level authentication with nginx, keycloak & elasticsearch

HySDS-Core