Alexa, say Hello

Since it’s launch, I have always been intrigued by the Amazon Echo, so when Amazon announced a rare discount on the Echo as part of their prime day, I took the bait and ordered one. Gladly, it did not fail to impress. Without any thing else, the Echo is a gorgeous bluetooth speaker in itself, in fact one of the best bluetooth speakers that I have ever used. But the ‘skills’ (or apps of the Echo world) are what makes the Echo really standout and shine.

After playing around with the built in skills (like asking a forecast or turning on/off your lights) for a while, I decided it was time to do more and build my own skills for the Echo. Thankfully Amazon has some decent documentation and tutorials around making these skills for the Echo. In summary, you have to use their Alexa Skills Kit (ASK) to build Alexa skills. A couple of nights over a weekend and I was able to publish my own skill ( a cricket based game called Cricket Nerd) to the skills market place.which by the way grew quite popular among friends and family. I will admit that being at ease with AWS and Python, did make it easy for me.

Alexa ❤ AWS

A significant portion of my work involves interacting with Amazon Web Services (AWS). Now I knew the Echo was already did a great job at learning new ‘skills’ , I also knew how well the Alexa Skills Kit worked with AWS resources and that led me thinking what if we could actually give Alexa the skills to manage and control our AWS resources on the cloud? What if Alexa could do the routine tasks like starting and stopping servers on the cloud, providing status of your instances or maybe even take backups for you? While it does seem a little far fetched to start with, if you break this down, AWS already provides you with a very good set of building blocks with which you can do a lot of automation, like AWS CloudFormation, AWS Beanstalk and AWS Lambda.

So the next weekend I set out to build a support skill that will enable the Echo to do basic admin tasks with servers on the cloud. Building a voice interface is quite different from building traditional user interfaces (more on that later) , but after a few attempts I was able to get a working skill ready and Alexa (Echo) was able to do start and stop an EC2 (The AWS equivalent of a virtual cloud servers) as well as take snapshot backups of the volumes attached.

The Demo

It’s difficult to describe the actual skill here, so I shot a video demo of the skill in action. I’ll apologize in advance for the camera shake and the audio, this was shot late at night with a phone :)

Skill Abilities

To begin with, we will build a skill for the Echo giving it the ability to do the following

Get status for your instances provisioned in a specific region.
Start an EC2 instance.
Stop an EC2 instance.
Take snapshot backups for EC2.

Building Your Skill — Phase 1

Building a new skill can be broken down into three main phases, first of which starts at the Amazon developers portal here. Create a new account if you don’t have one (sign up is free). Once you are signed in, head over directly to the Alexa section and click on ‘Add a new skill button’ and you will be presented with a screen like the one below.

Give your skill a name and an invocation name. The invocation name is what the users will use to activate the skill by saying ‘ask’ or ‘open’. It is useful to read the invocation name guidelines to help avoid frustrations later when you are ready to test your skill.

The next screen is the one where you define your interaction model, which is your intent schema and sample utterances. This is where you lay the foundation for your voice driven user interface.

An intent schema is list of intents in a JSON format against which you can later map actions or functions to perform specific tasks. Slots are optional and are like arguments that you pass along with the intents that your function can use to process certain requests.

For our skill to manage EC2 resources, we used the following intent schema

"intents": [
 {
   "intent": "GetStartServer",
   "slots": [
      {
       "name": "Instance",
       "type": "INSTANCE_NAME"
     }
    ]
 },
 {
   "intent": "GetStopServer",
   "slots": [
      {
       "name": "Instance",
       "type": "INSTANCE_NAME"
     }
    ]
 },
  {
   "intent": "GetServerStatus",
   "slots": []
 },
{
   "intent": "GetSnapshot",
   "slots": [
 {
       "name": "Instance",
       "type": "INSTANCE_NAME"
     }
]
 },

.........

Sample utterances are what the user will speak when using your skill. All sample utterances must be matched to the intents above or with built-in intents. Remember a user can ask the same question in a number of ways and your sample utterance should try to cover as many scenarios as possible. Here are the list of utterances for skill.

AMAZON.StartOverIntent open
AMAZON.StartOverIntent hello
AMAZON.StartOverIntent hi
GetStartServer bring up {Instance}
GetStartServer boot up {Instance}
GetStartServer startup {Instance}
GetStartServer start instance
GetStartServer bring up server {Instance}
GetStopServer bring down {Instance}
GetStopServer shutdown {Instance}
GetStopServer stop {Instance}
GetServerStatus get instance status
GetServerStatus get status
GetServerStatus status
GetServerStatus server status
GetServerStatus get server status
GetServerStatus instance status
GetSnapshot snapshot {Instance}
GetSnapshot backup {Instance}
.....

Again the documentation on both these items is very good and I highly recommend that you go through them and understand the best practices of voice design.

Going Serverless With AWS Lambda — Phase 2

The Echo (via ASK) uses web services to consume and output JSON. You can either build your web services by deploying your own web server or you can go serverless and create web services directly using AWS Lambda that are invoked on certain (ASK) events.

There are a number of advantages of going serverless with Lambda, to name a few: you don’t need to manage your own servers, you don’t need to deploy SSL certificates for your web servers, you don’t need to have your servers up and running all the time and you can get out of box integration and metrics with AWS Lambda.

When it comes to coding for ASK, you have a couple options. You can either go with Nodejs or use Python. Since I was already using the Boto3 Python SDK for AWS, I stuck with python to be able to reuse the code as much as possible.

For the steps in this next section you will need an AWS account. Log on to your AWS console and head over to the Lambda section. It’s important that you choose the US N Virginia region, since as of writing this post the Echo was only able to interact with Lambda functions in the US N Virginia region. Choose, create a new lambda function. You can skip the blueprint section and head over directly to ‘Configure Triggers’. Here we will need to choose the ‘Alexa Skills Kit’ as the trigger for your lambda function.

In the next screen, among other things, one of the most important is where you define a role for your Lambda section. You can create a new role with full lambda access as well as ability to start,stop,describe and snapshot an EC2 instance. If you have an existing role, you can use that as well.

You are now finally ready to write some code! (For the sake of readability, I am not pasting the entire code here, you can find that on GitHub.)

In the first section, we import the necessary packages and define the default routes

from __future__ import print_function
import boto3
def lambda_handler(event, context):
    if event['request']['type'] == "LaunchRequest":
    return on_launch(event['request'], event['session'])
elif event['request']['type'] == "IntentRequest":
    return on_intent(event['request'], event['session'])
elif event['request']['type'] == "SessionEndedRequest":
    return on_session_ended(event['request'], event['session'])
def on_session_started(session_started_request, session):
    print("on_session_started requestId=" + session_started_request['requestId']
      + ", sessionId=" + session['sessionId'])
def on_launch(launch_request, session):
    print("on_launch requestId=" + launch_request['requestId'] +
      ", sessionId=" + session['sessionId'])
    return get_welcome_response()

Next we map the intents we defined earlier to our custom python methods.

def on_intent(intent_request, session):
    print("on_intent requestId=" + intent_request['requestId'] +
      ", sessionId=" + session['sessionId'])
    intent = intent_request['intent']
    intent_name = intent_request['intent']['name']
    if intent_name == "GetStartServer":
        return start_ec2(intent, session)
    elif intent_name == "GetStopServer":
        return stop_ec2(intent, session)
    elif intent_name == "GetServerStatus":
        return getStatus(intent, session)
    elif intent_name == "GetSnapshot":
        return getSnapshot(intent, session)
.....
.....
    elif intent_name == "AMAZON.CancelIntent" or intent_name == "AMAZON.StopIntent":
    return handle_session_end_request()
    else:
        raise ValueError("Invalid intent")

Once the mapping is done, it’s time to define our custom methods each for starting, stopping, backing up and getting status of our EC2 resources

def start_ec2(intent, session):
    card_title = "Starting"
    session_attributes = {}
    instance = intent['slots']['Instance']['value']
    ec2 = boto3.resource('ec2', region_name='us-west-2')
    instances = ec2.instances.all()
    instID = []
    for i in instances:
        for t in i.tags:
            if t['Value'] == instance:
                ID=i.id
    instID.append(ID)
    ec2Client = boto3.client('ec2', region_name='us-west-2')
    ec2Client.start_instances(InstanceIds=instID)
    speech_output = "Starting instance " + instance + " ,with instance ID, " + ID + ",Thank you for using C.T.R support. Goodbye!"
    should_end_session = True
    return build_response(session_attributes, build_speechlet_response(
    card_title, speech_output, None, should_end_session))

Lastly we will write the helper functions that will be used to build out the JSON responses for the Echo

def build_speechlet_response(title, output, reprompt_text, should_end_session):
    return {
        'outputSpeech': {
            'type': 'PlainText',
            'text': output
        },
        'card': {
            'type': 'Simple',
            'title': 'SessionSpeechlet - ' + title,
            'content': 'SessionSpeechlet - ' + output
        },
        'reprompt': {
            'outputSpeech': {
                'type': 'PlainText',
                'text': reprompt_text
            }
        },
        'shouldEndSession': should_end_session
    }
def build_response(session_attributes, speechlet_response):
    return {
        'version': '1.0',
        'sessionAttributes': session_attributes,
        'response': speechlet_response
    }

Once you are done with your lambda code, you will get an ARN url, in a format like

arn:aws:lambda:us-east-1:xxxxxxxxxxx, make a note of this ARN.

Linking Everything — Phase 3

Now head back to the developer portal and select the skill you started building. Go to the ‘configuration’ section and enter the ARN of your lambda function here.

That’s it, you should be able to test your skill on the portal directly by typing out sample voice commands in the test section. If you have an Alexa enabled device like the Echo linked your account, the skill is also available instantly on your device. However in order to publish your skill for others, you will first have to submit it for certification.

I have just scratched the surface here on what you can do with the Echo in terms of managing your AWS resources. You can do a lot more and hopefully this post will help you get started.

Happy building!

samx18.io

Alexa, open support to backup DEV