본문 바로가기
  • AI (Artificial Intelligence)
Industry 4.0/Python

Hands-On Examples for Working with DynamoDB, Boto3, and Python

by 로샤스 2021. 4. 7.

Ref. highlandsolutions.com/blog/hands-on-examples-for-working-with-dynamodb-boto3-and-python

정리해야 하는데 정리할 시간이.. Python을 이용한 DynamoDB 핸들이라 생각하면 되겠네요.
위 링크로 직접가셔서 예제를 참고하세요. 도움이 많이 됩니다.

In this post, we’ll get hands-on with AWS DynamoDB, the Boto3 package, and Python. In my experience, I’ve found the documentation around this technology can be scattered or incomplete. I’ll do my best to explain and provide examples for some of the most common use cases.

The easiest way to run these examples is to set up an AWS Lambda function using the Python 3.7 runtime. Also, make sure to assign a role to your function that has access to interact with the DynamoDB service.


The Basics

Let's start by creating a Users table with just a hash key:

import boto3
from boto3.dynamodb.conditions import Key
 
def create_table():
dynamodb = boto3.resource('dynamodb')
 
table = dynamodb.create_table(
TableName='Users',
KeySchema=[
{
'AttributeName': 'id',
'KeyType': 'HASH'
},
],
AttributeDefinitions=[
{
'AttributeName': 'id',
'AttributeType': 'N'
},
],
ProvisionedThroughput={
'ReadCapacityUnits': 1,
'WriteCapacityUnits': 1,
}
)
 
print("Table status:", table.table_status)

view rawcreateTableBasic.py hosted with ❤ by GitHub

Now that we have a table we can do an insert:

import boto3
from boto3.dynamodb.conditions import Key
 
def create_user():
user = {
'id': 1,
'first_name': 'Jon',
'last_name': 'Doe',
'email': 'jdoe@test.com'
}
 
dynamodb = boto3.resource('dynamodb')
 
table = dynamodb.Table('Users')
 
table.put_item(Item=user)

view rawInsertDynamoDB.py hosted with ❤ by GitHub

Take note of the reserved words when defining your attributes https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/ReservedWords.html

Next, we can retrieve our newly created record. There are two main techniques to do this: get_item & query (we’ll cover scans later). Both of these methods will have the same throughput. However, get_item will return a single item, and query will return a list (unless we specify limit=1). For this reason, it is a good practice to use get_item when you have all the information to do so.

get_item:

import boto3
from boto3.dynamodb.conditions import Key
 
def get_item():
dynamodb = boto3.resource('dynamodb')
 
table = dynamodb.Table('Users')
 
resp = table.get_item(
Key={
'id' : 1,
}
)
 
if 'Item' in resp:
print(resp['Item'])
 
#{'id': Decimal('1'), 'email': 'jdoe@test.com', 'last_name': 'Doe', 'first_name': 'Jon'}

view rawGetItem.py hosted with ❤ by GitHub

query:

import boto3
from boto3.dynamodb.conditions import Key
 
def query():
dynamodb = boto3.resource('dynamodb')
 
table = dynamodb.Table('Users')
 
resp = table.query(
KeyConditionExpression=Key('id').eq(1)
)
 
if 'Items' in resp:
print(resp['Items'][0])
 
#{'id': Decimal('1'), 'email': 'jdoe@test.com', 'last_name': 'Doe', 'first_name': 'Jon'}

view rawquery.py hosted with ❤ by GitHub

Next, let's update our record:

import boto3
from boto3.dynamodb.conditions import Key
 
def update():
dynamodb = boto3.resource('dynamodb')
 
table = dynamodb.Table('Users')
 
table.update_item(
Key={
'id': 1,
},
UpdateExpression="set first_name = :g",
ExpressionAttributeValues={
':g': "Jane"
},
ReturnValues="UPDATED_NEW"
)
 
get_item()
#{'email': 'jdoe@test.com', 'id': Decimal('1'), 'last_name': 'Doe', 'first_name': 'Jane'}

view rawUpdate.py hosted with ❤ by GitHub

Alternatively, we could have used the same put_item method we used to create the item with. We would just need to make sure we passed in all the attributes and values in the item (not just the 1 we wanted to update).

And finally, let's delete our record:

import boto3
from boto3.dynamodb.conditions import Key
 
def delete_user():
dynamodb = boto3.resource('dynamodb')
 
table = dynamodb.Table('Users')
 
response = table.delete_item(
Key={
'id': 1,
},
)

view rawdelete.py hosted with ❤ by GitHub

Scans

A Scan operation in Amazon DynamoDB reads every item in a table or a secondary index. By default, a Scan operation returns all of the data attributes for every item in the table or index. You can use the ProjectionExpression parameter so that Scan only returns some of the attributes, rather than all of them.

Using the same table from the above, let's go ahead and create a bunch of users.

import boto3
from boto3.dynamodb.conditions import Key
 
def create_bunch_of_users():
dynamodb = boto3.resource('dynamodb')
 
table = dynamodb.Table('Users')
 
for n in range(3):
table.put_item(Item={
'id': n,
'first_name': 'Jon',
'last_name': 'Doe' + str(n),
'email': 'jdoe'+ str(n) +'@test.com'
})

view rawcreateBunchOfRecords.py hosted with ❤ by GitHub

Basic scan example:

 

We can see above that all the attributes are being returned.

Here is an example of just scanning for all first & last names in the database:

import boto3
from boto3.dynamodb.conditions import Key
 
def scan_first_and_last_names():
dynamodb = boto3.resource('dynamodb')
 
table = dynamodb.Table('Users')
 
resp = table.scan(ProjectionExpression="first_name, last_name")
 
print(resp['Items'])
 
'''
[
{'last_name': 'Doe2', 'first_name': 'Jon'},
{'last_name': 'Doe1', 'first_name': 'Jon'},
{'last_name': 'Doe0', 'first_name': 'Jon'}
]
 
'''

view rawscanFirstLastName.py hosted with ❤ by GitHub

Scans have a 1mb limit on the data returned. If we think we’re going to exceed that, we should continue to re-scan and pass in the LastEvaluatedKey:

import boto3
from boto3.dynamodb.conditions import Key
 
def multi_part_scan():
dynamodb = boto3.resource('dynamodb')
 
table = dynamodb.Table('Users')
 
response = table.scan()
result = response['Items']
 
while 'LastEvaluatedKey' in response:
response = table.scan(ExclusiveStartKey=response['LastEvaluatedKey'])
result.extend(response['Items'])
 
 

view rawmultiPartScan.py hosted with ❤ by GitHub

Hash + Range Key

Hash and Range Primary Key — The primary key is made of two attributes. The first attribute is the hash attribute and the second attribute is the range attribute. For example, the forum Thread table can have ForumName and Subject as its primary key, where ForumName is the hash attribute and Subject is the range attribute. DynamoDB builds an unordered hash index on the hash attribute and a sorted range index on the range attribute.

To Demonstrate this next part, we’ll build a table for books. The title will be our hash key and author will be our range key.

import boto3
from boto3.dynamodb.conditions import Key
 
def create_table_with_range():
dynamodb = boto3.resource('dynamodb')
 
table = dynamodb.create_table(
TableName='Books',
KeySchema=[
{
'AttributeName': 'title',
'KeyType': 'HASH'
},
{
'AttributeName': 'author',
'KeyType': 'RANGE'
}
],
AttributeDefinitions=[
{
'AttributeName': 'title',
'AttributeType': 'S'
},
{
'AttributeName': 'author',
'AttributeType': 'S'
},
],
ProvisionedThroughput={
'ReadCapacityUnits': 1,
'WriteCapacityUnits': 1,
}
)
 
print("Table status:", table.table_status)
 
def create_books():
book = {
'title': "This is a Good Book",
'author': 'Jon Doe',
'year': '1980'
}
 
another_book = {
'title': "This is a Good Book",
'author': 'Jane Doe',
'year': '1998'
}
 
dynamodb = boto3.resource('dynamodb')
 
table = dynamodb.Table('Books')
 
table.put_item(Item=book)
table.put_item(Item=another_book)

view rawRangeKeySetup.py hosted with ❤ by GitHub

And here is an example using range key with some of the techniques we learned above:

import boto3
from boto3.dynamodb.conditions import Key
 
def fetch_data_with_range():
dynamodb = boto3.resource('dynamodb')
 
table = dynamodb.Table('Books')
 
resp = table.get_item(
Key={
'title' : 'This is a Good Book',
'author': 'Jane Doe'
}
)
 
print(resp['Item'])
#{'year': '1998', 'title': 'This is a Good Book', 'author': 'Jane Doe'}
 
 
resp = table.query(
KeyConditionExpression=
Key('title').eq('This is a Good Book') & Key('author').eq('Jon Doe')
)
 
print(resp['Items'][0])
#{'year': '1980', 'title': 'This is a Good Book', 'author': 'Jon Doe'}

view rawRangeKeyFetch.py hosted with ❤ by GitHub

Global Secondary Index (GSI)

Some applications might need to perform many kinds of queries, using a variety of different attributes as query criteria. To support these requirements, you can create one or more global secondary indexes and issue Query requests against these indexes in Amazon DynamoDB.

To illustrate this we’re going to create an Employee table with employee_id as our hash kay and email address as our GSI.

import boto3
from boto3.dynamodb.conditions import Key
 
def create_table_with_gsi():
dynamodb = boto3.resource('dynamodb')
 
table = dynamodb.create_table(
TableName='Employees',
KeySchema=[
{
'AttributeName': 'emp_id',
'KeyType': 'HASH'
},
],
AttributeDefinitions=[
{
'AttributeName': 'emp_id',
'AttributeType': 'N'
},
{
'AttributeName': 'email',
'AttributeType': 'S'
},
 
],
GlobalSecondaryIndexes=[
{
'IndexName': 'email',
'KeySchema': [
{
'AttributeName': 'email',
'KeyType': 'HASH'
},
],
'Projection': {
'ProjectionType': 'ALL'
},
'ProvisionedThroughput' :{
'ReadCapacityUnits': 1,
'WriteCapacityUnits': 1,
}
}
],
ProvisionedThroughput={
'ReadCapacityUnits': 1,
'WriteCapacityUnits': 1,
}
)
 
print("Table status:", table.table_status)
 
def create_employee():
user = {
'emp_id': 1,
'first_name': 'Jon',
'last_name': 'Doe',
'email': 'jdoe@test.com'
}
 
dynamodb = boto3.resource('dynamodb')
 
table = dynamodb.Table('Employees')
 
table.put_item(Item=user)

view rawSetupGSI.py hosted with ❤ by GitHub

And here is an example of a query with an GSI:

import boto3
from boto3.dynamodb.conditions import Key
 
def query_data_with_gsi():
dynamodb = boto3.resource('dynamodb')
 
table = dynamodb.Table('Employees')
 
response = table.query(
IndexName='email',
KeyConditionExpression=Key('email').eq('jdoe@test.com')
)
 
print(response['Items'][0])
#{'last_name': 'Doe', 'email': 'jdoe@test.com', 'first_name': 'Jon', 'emp_id': Decimal('1')}

view rawQueryGSI.py hosted with ❤ by GitHub

At the time of writing this get_item on GSI is not supported.

Local Secondary Index (LSI)

Some applications only need to query data using the base table’s primary key. However, there might be situations where an alternative sort key would be helpful. To give your application a choice of sort keys, you can create one or more local secondary indexes on an Amazon DynamoDB table and issue Query or Scan requests against these indexes.

To demonstrate this we’re going to create a Posts table with user_name as our hash key, title as our range key and we’ll have a LSI on user_name & subject.

import boto3
from boto3.dynamodb.conditions import Key
 
def create_table_with_lsi():
dynamodb = boto3.resource('dynamodb')
 
table = dynamodb.create_table(
TableName='Posts',
KeySchema=[
{
'AttributeName': 'user_name',
'KeyType': 'HASH'
},
{
'AttributeName': 'title',
'KeyType': 'RANGE'
}
],
AttributeDefinitions=[
{
'AttributeName': 'title',
'AttributeType': 'S'
},
{
'AttributeName': 'user_name',
'AttributeType': 'S'
},
{
'AttributeName': 'subject',
'AttributeType': 'S'
},
 
],
LocalSecondaryIndexes=[
{
'IndexName': 'user_name_subject',
'KeySchema': [
{
'AttributeName': 'user_name',
'KeyType': 'HASH'
},
{
'AttributeName': 'subject',
'KeyType': 'RANGE'
},
],
'Projection': {
'ProjectionType': 'ALL'
},
}
],
ProvisionedThroughput={
'ReadCapacityUnits': 1,
'WriteCapacityUnits': 1,
}
)
 
print("Table status:", table.table_status)
 
def create_posts():
post1 = {
'title': "My favorite hiking spots",
'user_name': 'jon_doe',
'subject': 'hiking'
}
 
post2 = {
'title': "My favorite recipes",
'user_name': 'jon_doe',
'subject': 'cooking'
}
 
post3 = {
'title': "I love hiking!",
'user_name': 'jane_doe',
'subject': 'hiking'
}
 
dynamodb = boto3.resource('dynamodb')
 
table = dynamodb.Table('Posts')
 
table.put_item(Item=post1)
table.put_item(Item=post2)
table.put_item(Item=post3)

view rawSetupLSI.py hosted with ❤ by GitHub

And here is an example of a query with an LSI:

import boto3
from boto3.dynamodb.conditions import Key
 
def query_data_with_lsi():
dynamodb = boto3.resource('dynamodb')
 
table = dynamodb.Table('Posts')
 
response = table.query(
IndexName='user_name_subject',
KeyConditionExpression=
Key('user_name').eq('jon_doe') & Key('subject').eq('hiking')
)
 
print(response['Items'][0])
#{'subject': 'hiking', 'user_name': 'jon_doe', 'title': 'My favorite hiking spots'}

view rawQueryLSI.py hosted with ❤ by GitHub

At the time of writing this get_item on LSI is not supported

As stated in the intro, I just wanted to bring all these examples into one place. I hope that this can be a helpful resource for some of the most common use cases. 

댓글