Alfresco is a web based application. Most organization using it, host the software on a globally connected server that is accessed by people all over the globe. These global distances create obvious performance issues , even for those organizations who don’t have a global use case. This problem can be addressed by multiple solutions that can stack upon each other. For example in our previous post, we discussed how to connect Amazon S3 with Alfresco Community, which also improves the performance and availability of data quite a bit. But that is not the only option. You can also leverage other Amazon products like Amazon CloudFront to improve your content delivery, response times, and even save some money.
Why Amazon Cloudfront
The world is a very big place, and Internet is even bigger. Every website that you visit, every resource that you call or send through the internet goes through this huge interlinked web to reach its destination. Sometimes it is direct, from one server to another via maybe 2 or 3 servers, but most of the times it is very complex involving multiple servers and multiple requests.
For example, if you run traceroute command (or tracert in cmd) for www.alfresco.com, based on your location you would get at least 10 hops before reaching the final one (I got 17 hops). You may also notice the ping times. These ping times add up as the number of hops increases, ultimately decreasing the response times.
This is again true for web based applications like Alfresco. The situation becomes more problematic when your Alfresco resources libraries become thousand strong.
This is where CloudFront comes in. CloudFront is a content delivery network that drastically improves delivery times. CloudFront caches your resources at its each edge location servers and serves a cached copy, or directly fetches the resource from the Origin Server (the server where the resource is saved). This speed is again increased when your origin server is Amazon S3 bucket.
So considering that we are saving our resources on Amazon S3 with Alfresco as our main resource management repository, let look at how we can configure and connect Alfresco, Amazon S3, and Amazon CloudFront to give a fast and high performing solution.
Create and Configure Web Distribution
The first step is to create and configure our CloudFront and prepare it for connecting with Alfresco and Amazon S3.
- Step 1: Open https://console.aws.amazon.com/cloudfront/home
- Step 2: Click on “Create Distribution”.
- Step 3: Click on “Get Started” for “Web”.
- Step 4: At “Origin Settings” Section, select the “Origin Domain Name” which would be your S3 bucket’s domain.
- Step 5: Choose “Yes” for “Restrict Bucket Access”. This will make it mandatory for users to access Amazon S3 bucket through CloudFront URLs only, making the final solution a little bit more secure.
- Step 6: For “Origin Access Identity” choose “Create a New Identity”. If you already have a pre-created Origin Access Identity, you can also choose that. Or you can create a new one.
- Step 7: Now “Comment” will be auto-populated.
- Step 8: Select “Yes, Update Bucket Policy” for “Grant Read Permissions on Bucket”.
- Step 9: At “Default Cache Behavior Settings” Section, check “Yes” for “Restrict Viewer Access” and check “Self” for “Trusted Signers”.
- Step 10: Leave all other settings to default.
- Step 11: Now at the end of the page click on “Create Distribution”.
- Step 12: On Successful creation of Web Distribution please make a note of “Domain Name” (it would be like d1y2y4u2eb5u8.cloudfront.net) available at General information section. We are going to use it later.
Creating CloudFront Key Pairs for Your Trusted Signers
The next step is to create CloudFront Key pairs that we are going to use for security and access.
- Step 1: Sign in to the AWS Management Console using the root credentials for an AWS account.
- Step 2: On the account-name menu, click “Security Credentials”.
- Step 3: Expand “CloudFront Key Pairs”.
- Step 4: Confirm that you have no more than one active key pair. You can’t create a key pair if you already have two active key pairs.
- Step 5: Click “Create New Key Pair”.
- Step 6: In the Create Key Pair dialog box, click “Download Private Key File”.
- Step 7: In the Opening dialog box, accept the default value of Save File, and click OK to download and save the private key for your CloudFront key pair.
- Step 8:Record the key pair ID for your key pair. (In the AWS Management Console, this is called the access key ID.) We are going to need this key pair ID later.
- Step 9: Convert the private key to DER format. To do this, you can use OpenSSL. You can do that by running this simple command.
$ openssl pkcs8 -topk8 -nocrypt -in origin.pem -inform PEM -out new.der -outform DER - Step 10: Now placed this converted private key at the server(on which alfresco installation exists) and note its path.
Connecting CloudFront with Alfresco
Now we need to connect Alfresco with CloudFront. Even the basic types of connections involve creating custom scripts that run with Alfresco or sometimes simple API modifications suffice. It all ultimately depends upon the use case, how you want to use Cloudfront, what you want to cache, what you resources are more critical and require low latency, etc.
For example, in one of our projects, we had to decrease the amount of time it took to display the thumbnails and we did that by using Cloudfront as thumbnail URLs. We had a very large repository of images, out which finding a right image involved viewing the image thumbnails, which if viewed through website address, gave high latency especially for asia pacific locations.
We overrode the Alfresco API handling the thumbnail i.e. “/api/node/{store_protocol}/{store_id}/{node_id}/metadata”, injecting CloudFront URLs as defaults for thumbs.
You can build the cloudfront url using following command:
String cloudfronURL = CloudFrontUrlSigner.getSignedURLWithCannedPolicy(CloudFrontUrlSigner.Protocol.http, distributionDomain, new File(privateKeyFilePath),s3ObjectKey, keyPairId, expiryDate);
Then append this “cloudfronURL” in the response. In this code
- “distributionDomain” is the value of domain name that we captured in step 12 of Creating Web Distribution;
- “privateKeyFilePath” is the file path of the key pair file that we save in Alfresco at step 10 of Creating CloudFront key pairs.
- “keyPaidId” is the key pair ID that we created at step 8 of Creating CloudFront key pairs.
The response of above API will be received in share by following “web-preview” web script when user navigate to the detail page of the content so now you need to modify file “web-preview.get.js” to set “cloudfrontURL” in the “webpreview” widget:
var webPreview = { id : "WebPreview", name : "Alfresco.WebPreview", options : { siteId: this.page ? page.url.templateArgs.site : (args.site != null ? args.site : ""), thumbnailModification : nodeMetadata.thumbnailModifications, nodeRef : model.nodeRef, cloudfronURL : nodeMetadata.cloudfronURL, name : nodeMetadata.name, mimeType : nodeMetadata.mimeType, size: nodeMetadata.size, thumbnails : nodeMetadata.thumbnails, pluginConditions : pluginConditionsJSON, api: model.api, proxy: model.proxy } };
Now you have to modify “web-preview.js” file and add “cloudfrontURL” in the options list and then update function “getContentUrl” to return the cloudfront url by placing this statement “return this.option.cloudfronURL” in it.
And viola, you are done.The web preview will generate thumbnails using CloudFront ULRs.
This is obviously one such usecase where you can use CloudFront, but the possibilities are limitless. However as you may have guessed, every different use case would require a little different tweaking and coding.
Increase the speed some more?
Connecting CloudFront with your Alfresco, moving the repository to S3 and using Amazon EC2 as your Alfresco’s hosting server. These are all the main ways that you can use to increase your application’s performance by using the power of Amazon Cloud. If you need to further increase the speed and overall performance, you may either consider investing in a more powerful instance in Amazon, or you can optimize your Alfresco and its related processes. I would be working on a post to list down all that stuff that you can do to increase the performance of Alfresco based solutions. Even though I know there are tons of posts that talk about this and there is really good Alfresco documentation available, I would still be looking forward to responses from you all guys. A fresh experienced perspective is always a good idea. Comment or tweet or mail, any mode you prefer.
References: aws.amazon.com, docs.aws.amazon.com, docs.alfresco.com