2. Develop for Azure storage (15-20%)
2.1 Develop solutions that use Cosmos DB storage
select the appropriate API and SDK for a solution
- Azure Cosmos DB
- data is stored in atom-record-sequence (ARS) format
- data is abstracted/projected as an API that is specified when creating the database
- Choices
- Consistency levels
- strong, bounded staleness, session, consistent prefix, eventual
- Multi-model
- document, key-value, wide-column, graph-based
- Data access and data manipulation
- use your own industry standard APIs
- Azure Cosmos DB account APIs
- Core (SQL)
- traditional NoSQL document store: query hierarchical JSON documents with SQL-like language
- e.g.:
SELECT, FROM, WHERE, ...
- e.g.:
- uses JavaScript’s type system, expression evaluation and function invocation
- traditional NoSQL document store: query hierarchical JSON documents with SQL-like language
- MongoDB
- supports MongoDB wire protocol: allows existing MongoDB client SDKs, drivers and tools to interact with the data
- JSON document format queries:
- e.g.:
db.Items.find({},{productName:1,_id:0})
- e.g.:
- Cassandra
- possible to query data by using Cassandra Query Language (CQL), data appears in partitioned row store
- e.g.:
CREATE KEYSPACE, CREATE TABLE, ALTER TABLE, USE, INSERT, SELECT, UPDATE, DELETE...
CREATE TABLE Catalog.Items(id text, productName text, description text, supplier text, quantity int, unitCost float, retailPrice float, categories map<text,text>, primary key (id));
SELECT id, productName FROM catalog.items
- e.g.:
- possible to query data by using Cassandra Query Language (CQL), data appears in partitioned row store
- Azure Table
- supports applications that are written for Azure Table Storage that needs premium capabilities (global distribution, high availability, scalable throughput)
- often used for IoT data
- allows for indexing on the Partition and Row keys, no secondary indexes exist
- automatically indexes all the properties, requires no index management
- querying is accomplished by using OData and LINQ queries in code + original REST API for GET operations
- e.g.:
SELECT i.productName FROM Items i
- e.g.:
- supports applications that are written for Azure Table Storage that needs premium capabilities (global distribution, high availability, scalable throughput)
- Gremlin (graph)
- provides graph-based view over the data
- vertex: individual item in database
- edge: relationship between items in database
- traversal query language: Apache Tinkerpop - Gremlin
- e.g.: ```js # add 3 vertices (products) g.addV(‘product’).property(‘productName’, ‘Industrial Saw’).property(‘description’, ‘Cuts through anything’).property(‘quantity’, 261) g.addV(‘product’).property(‘productName’, ‘Belt Sander’).property(‘description’, ‘Smoothes rough edges’).property(‘quantity’, 312) g.addV(‘product’).property(‘productName’, ‘Cordless Drill’).property(‘description’, ‘Bores holes’).property(‘quantity’, 647)
# add 2 edges (product relationships) g.V().hasLabel(‘product’).has(‘productName’, ‘Industrial Saw’).addE(‘boughtWith’).to(g.V().hasLabel(‘product’).has(‘productName’, ‘Belt Sander’)) g.V().hasLabel(‘product’).has(‘productName’, ‘Industrial Saw’).addE(‘boughtWith’).to(g.V().hasLabel(‘product’).has(‘productName’, ‘Cordless Drill’))
# query products g.V().hasLabel(‘product’).has(‘productName’, ‘Industrial Saw’).outE(‘boughtWith’) ```
- returned results are in GraphSON format
- e.g.:
{ "id": "6c69fba7-2f76-421f-a24e-92d4b8295d67", "label": "boughtWith", "type": "edge", "inVLabel": "product", "outVLabel": "product", "inV": "faaf0997-f5d8-4d01-a527-ae29534ac234", "outV": "a9b13b8f-258f-4148-99c0-f71b30918146" }
- e.g.:
- provides graph-based view over the data
- Core (SQL)
- Consistency levels
- Decision Flow
implement partitioning schemes and partition keys
- partitioning schemes: scale individual containers in a database to meet the performance needs of your application
- logical partitions: items in a container are divided into distinct subsets
- based on the value of a partition key: associated with each item in a container, all items in a logical partition have the same partition key value
- item index = item ID + partition key
- choosing the right partition key is important for performance
- defines the scope of database transactions
- update items using a transaction with snapshot isolation
- based on the value of a partition key: associated with each item in a container, all items in a logical partition have the same partition key value
- physical partitions: container is scaled by distributing data and throughput across physical partitions
- internally, on or more logical partitions are mapped to a single physical partition
- containers have many logical partitions, but only require a single physical partition
- see container’s physical partition in Azure Portal -> Storage -> Metrics blade
- logical partitions: items in a container are divided into distinct subsets
- choosing a partition key: requirements
- should be immutable property
- have a high cardinality
- request unit (RU) consumption and data storage should be spread evenly across all logical partitions
- for large read-heavy containers: choose partition key that appears frequently as a filter in your queries -> reduces cross-partition queries
- for small read/write-heavy containers: use item ID as partition key
perform operations on data and Cosmos DB containers
- Data Explorer: Azure Portal -> Databases -> Azure Cosmos DB -> select database -> Data Explorer
- Add new items
- Query/filter data using:
- SQL queries
- Geospatial queries (GeoJSON points - coordinates)
- stored procedures
- perform complex transactions on documents and properties
- written in JavaScript, stored in container on Azure Cosmos DB
- improves performance over client-side programming
- only way to achieve atomic transactions within Azure Cosmos DB
- client-side SDKs do not support transactions
- e.g.:
function helloWorld() { var context = getContext(); var response = context.getResponse(); response.setBody("Hello, World"); }
- user-defined functions (UDF)
- extend Azure Cosmos DB SQL query language grammar
- implement custom business login, such as calculation on properties/documents
- can only be called from inside queries (unlike stored procedures), they do not have access to the context object
- e.g.:
function producttax(price) { if (price == undefined) throw 'no input'; var amount = parseFloat(price); if (amount < 1000) return amount * 0.1; else if (amount < 10000) return amount * 0.2; else return amount * 0.4; }
set the appropriate consistency level for operations
- Consistency levels:
- Strong consistency
- linearizability guarantee: reads are guaranteed to return most recent committed version of an item, client never sees uncommitted or partial write
- Bounded staleness:
- consistent-prefix guarantee: reads might lag behind writes by “K” versions or by “T” time interval
- staleness can be configured in 2 ways:
- number of versions (K) of the item - 100 items
- time interval (T) reads might lag behind writes - 5 seconds
- staleness can be configured in 2 ways:
- consistent-prefix guarantee: reads might lag behind writes by “K” versions or by “T” time interval
- Session
- within a single client session reads are guaranteed to honor the:
- consistent-prefix guarantees
- monotonic reads/writes guarantees
- read-your-writes guarantees
- write-follows-reads guarantees
- assumes a single “writer” session or sharing the session token for multiple writers
- within a single client session reads are guaranteed to honor the:
- Consistent prefix
- updates that are returned contain some prefix of all the updates, with no gaps
- Eventual
- no ordering guarantee for reads
- in absence of further writes, the replicas eventually converge
- Strong consistency
manage change feed notifications
- persistent record of changes to a container in the order they occur
- works by listening to an Azure Cosmos container for any changes
- outputs the sorted list of documents that were changed in the order in which they were modified
- persisted changes can be processed asynchronously and incrementally
- output can be distributed across on or more consumers for parallel processing
- working with change feed
- with Azure Functions: create small reactive Azure Functions that will automatically be triggered on each new event in change feed
- with change feed processor: simplifies the process of reading the change feed and distribute the event processing across multiple consumers effectively
- components:
- monitored container: has the data from which the change feed is generated
- lease container: acts as a state storage and coordinates processing across multiple workers
- host: application instance that uses the change feed processor to listen for changes, multiple instances with the same lease configuration can run in parallel
- delegate: code that defines what you want to do with each batch of changes that the change feed processor reads
- implementation:
- from monitored container
Container
(= code entrypoint), callGetChangeFeedProcessorBuilder
- e.g.:
/// <summary> /// Start the Change Feed Processor to listen for changes and process them with the HandleChangesAsync implementation. /// </summary> private static async Task<ChangeFeedProcessor> StartChangeFeedProcessorAsync( CosmosClient cosmosClient, IConfiguration configuration) { string databaseName = configuration["SourceDatabaseName"]; string sourceContainerName = configuration["SourceContainerName"]; string leaseContainerName = configuration["LeasesContainerName"]; Container leaseContainer = cosmosClient.GetContainer(databaseName, leaseContainerName); ChangeFeedProcessor changeFeedProcessor = cosmosClient.GetContainer(databaseName, sourceContainerName) .GetChangeFeedProcessorBuilder<ToDoItem>(processorName: "changeFeedSample", onChangesDelegate: HandleChangesAsync) .WithInstanceName("consoleHost") .WithLeaseContainer(leaseContainer) .Build(); Console.WriteLine("Starting Change Feed Processor..."); await changeFeedProcessor.StartAsync(); Console.WriteLine("Change Feed Processor started."); return changeFeedProcessor; }
- from monitored container
- components:
2.2 Develop solutions that use blob storage
move items in Blob storage between storage accounts or containers
- synchronous and asynchronous methods for transferring blobs
- move blobs between Azure storage accounts using:
- .NET Storage Client library
- collection of objects/methods that you can use to build custom applications that manipulate items held in Azure Storage
- construct your own applications to upload/download/migrate blobs between storage accounts
- incorporate your code into existing applications, deploy code to variety of environments (e.g.: Azure Functions) ```csharp using Microsoft.WindowsAzure.Storage; using Microsoft.WindowsAzure.Storage.Blob;
// The variable sourceConnection is a string holding the connection string for the storage account CloudStorageAccount sourceAccount = CloudStorageAccount.Parse(sourceConnection); CloudBlobClient sourceClient = sourceAccount.CreateCloudBlobClient();
// Download a blob CloudBlobContainer sourceBlobContainer = sourceClient.GetContainerReference(sourceContainer); ICloudBlob sourceBlob = await sourceBlobContainer.GetBlobReferenceFromServerAsync(“MyBlob.doc”); Console.WriteLine($”Last modified: {sourceBlob.Properties.LastModified}”); await sourceBlob.DownloadToFileAsync(“MyFile.doc”, System.IO.FileMode.Create);
// Upload a blob CloudBlobContainer destBlobContainer = destClient.GetContainerReference(destContainer); CloudBlockBlob destBlob = destBlobContainer.GetBlockBlobReference(“NewBlob.doc”); await destBlob.UploadFromFileAsync(“MyFile.doc”);
// Delete a blob bool blobExisted = await sourceBlob.DeleteIfExistsAsync();
// Iterate blobs in a container BlobContinuationToken token = null;
do { BlobResultSegment segment = await blobContainer.ListBlobsSegmentedAsync(prefix: “”, currentToken: token); foreach (CloudBlockBlob blobItem in segment.Results) { // Process the current blob ICloudBlob blob = await blobContainer.GetBlobReferenceFromServerAsync(blobItem.Name); … } } while (token != null);
```csharp // Copy blob between storage accounts CloudBlockBlob destBlob = destContainer.GetBlockBlobReference(sourceBlob.Name); await destBlob.StartCopyAsync(new Uri(GetSharedAccessUri(sourceBlob.Name, sourceContainer))); // Create a SAS token for the source blob, to enable it to be read by the StartCopyAsync method private static string GetSharedAccessUri(string blobName, CloudBlobContainer container) { DateTime toDateTime = DateTime.Now.AddMinutes(60); SharedAccessBlobPolicy policy = new SharedAccessBlobPolicy { Permissions = SharedAccessBlobPermissions.Read, SharedAccessStartTime = null, SharedAccessExpiryTime = new DateTimeOffset(toDateTime) }; CloudBlockBlob blob = container.GetBlockBlobReference(blobName); string sas = blob.GetSharedAccessSignature(policy); return blob.Uri.AbsoluteUri + sas; }
- Azure CLI:
az storage
- upload/download blobs between blob storage/local file system
az storage blob copy
az storage blob upload-batch \ --destination specifications \ --pattern "*.md" \ --source ~/sample/specifications \ --account-name $HOT_STORAGE_NAME \ --account-key $HOT_KEY
- runs asynchronously
- uses Azure Storage service to manage the copy process
- upload/download blobs between blob storage/local file system
- AzCopy tool
- transfer data in/out/between Azure storage accounts
- upload data:
azcopy copy "myfile.txt" "https://myaccount.blob.core.windows.net/mycontainer/"
- download data:
azcopy copy "https://myaccount.blob.core.windows.net/mycontainer/myblob?<sas token>" "myblobdata"
- between storage acounts:
azcopy copy "https://sourceaccount.blob.core.windows.net/sourcecontainer/*?<source sas token>" "https://destaccount.blob.core.windows.net/destcontainer/*?<dest sas token>"
- upload data:
- supports hierarchical containers
- supports selection by pattern matching
- transfer data in/out/between Azure storage accounts
- .NET Storage Client library
set and retrieve properties and metadata
- Set/retrieve system properties: Azure Storage client library for .NET
- exist on each Blob storage resource, some correspond to certain standard HTTP headers
- e.g.: set
ContentType
adnContentLanguage
system properties using theSetHttpHeaders
orSetHttpHeadersAsync
callspublic static async Task SetBlobPropertiesAsync(BlobClient blob) { Console.WriteLine("Setting blob properties..."); try { // Get the existing properties BlobProperties properties = await blob.GetPropertiesAsync(); BlobHttpHeaders headers = new BlobHttpHeaders { // Set the MIME ContentType every time the properties // are updated or the field will be cleared ContentType = "text/plain", ContentLanguage = "en-us", // Populate remaining headers with // the pre-existing properties CacheControl = properties.CacheControl, ContentDisposition = properties.ContentDisposition, ContentEncoding = properties.ContentEncoding, ContentHash = properties.ContentHash }; // Set the blob's properties. await blob.SetHttpHeadersAsync(headers); } catch (RequestFailedException e) { Console.WriteLine($"HTTP error code {e.Status}: {e.ErrorCode}"); Console.WriteLine(e.Message); Console.ReadLine(); } }
- Set/retrieve user-defined metadata
- name-value pairs in the
Metadata
collection on the resource - set using:
SetMetadata
andSetMetadataAsync
- e.g.:
public static async Task AddBlobMetadataAsync(BlobClient blob) { Console.WriteLine("Adding blob metadata..."); try { IDictionary<string, string> metadata = new Dictionary<string, string>(); // Add metadata to the dictionary by calling the Add method metadata.Add("docType", "textDocuments"); // Add metadata to the dictionary by using key/value syntax metadata["category"] = "guidance"; // Set the blob's metadata. await blob.SetMetadataAsync(metadata); } catch (RequestFailedException e) { Console.WriteLine($"HTTP error code {e.Status}: {e.ErrorCode}"); Console.WriteLine(e.Message); Console.ReadLine(); } }
- e.g.:
- get using:
GetProperties
andGetPropertiesAsync
- e.g.:
public static async Task ReadBlobMetadataAsync(BlobClient blob) { try { // Get the blob's properties and metadata. BlobProperties properties = await blob.GetPropertiesAsync(); Console.WriteLine("Blob metadata:"); // Enumerate the blob's metadata. foreach (var metadataItem in properties.Metadata) { Console.WriteLine($"\tKey: {metadataItem.Key}"); Console.WriteLine($"\tValue: {metadataItem.Value}"); } } catch (RequestFailedException e) { Console.WriteLine($"HTTP error code {e.Status}: {e.ErrorCode}"); Console.WriteLine(e.Message); Console.ReadLine(); } }
perform operations on data by using the appropriate SDK
- e.g.:
- name-value pairs in the
- Azure Blob Storage client library v12 for .NET:
Azure.storage.Blobs
- Upload blobs
await blobClient.UploadAsync(localFilePath, true);
- List blobs
await foreach (BlobItem blobItem in containerClient.GetBlobsAsync()) { Console.WriteLine("\t" + blobItem.Name); }
- Download blobs
await blobClient.DownloadToAsync(downloadFilePath);
- Upload blobs
implement storage policies, and data archiving and retention
- Data archiving and retention policies:
- all blobs in the container will stay in the immutable state for the duration of the effective retention period
- Configure: Azure Portal -> Azure Time Series Insights environment
- Settings -> Storage configuration
- Data retention time (in days)
- Capacity
- Storage limit exceeded behavior
- Purge old data
- Pause ingress
- Settings -> Storage configuration