Short links

October 26, 2022 Reading time: 6 minutes

Popularised by services like bit.ly, short links provide a means of reducing the length of a URL to make it easier to deal with character limit constraints imposed by technology, like SMS or Twitter. I've written this article to act as a sort of go-to as one of my clients wanted to know an easier way to convert a long-form URL to a shortened one for easier distribution.

Shortening links effectively follows a dehydrate - hydrate process, where a unique key is created that is used to store a value. This allows us to shorten a long link like this: https://kyc.intergreatme.com/za/igm.demo/?txId=xxxxd42b-6633-457f-a4d8-9b9d07cdxxxx&originTxId=yyyy656d-2536-4573-82dd-cc26ff09yyyy to https://go.intergreatme.com/xyd82c

Note: the links in this article are for illustration purposes only and do not link to anything in the Intergreatme environment. I also use tx_id and origin_tx_id whereas the URI uses txId and OriginTxId.

When the user navigates to our short link, the code will do a lookup on the key and then rehydrate the value as necessary. Once the link has been rehydrated, an HTTP redirect is used to navigate the user to the intended destination.

The KYC demo at Intergreatme follows this link shortening process to create a smaller URI to be sent via SMS.

The dehydration strategy involves taking our unique identifiers (the tx_id and origin_tx_id) and combining them in to a smaller short_id.

<?php
    if(!empty($_GET)) {
        $short = array_keys($_GET)[0];
        try {
            $source = 'ACCESS_UNKNOWN';
            if(isset($_GET['u'])) {
                if($_GET['u'] == 'm') {
                    $source = 'ACCESS_QRCODE';
                } elseif ($_GET['u'] == 'e') {
                    $source = 'ACCESS_EMAIL';
                } elseif($_GET['u'] == 's') {
                    $source = 'ACCESS_SMS';
                } else {
                    $source = 'ACCESS_UNKNOWN_SOURCE';
                }
            }
            // 
            $tx = DB::Fetch('SELECT tx_id, origin_tx_id, company_uuid, created_on FROM rkyc_allowlist_items WHERE short_id = ? ORDER BY created_on DESC', array($short));
            $uri = 'https://kyc.intergreatme.com/za/?whitelistId='.$tx['tx_id'].'&originTxId='.$tx['origin_tx_id'];
            header('Location: '.$uri);
            echo 'If the server did not redirect you, please click here: '.$uri;
            exit;
        } catch(Exception $ex) {
            error_log($ex->getMessage(), 0);
        }
    } else {
        header('Location: ../');
    }
?>

The short_id is a unique field and does not allow duplicates, which could cause an issue if two people get the same short_id and both of their transactions are still active. The system would not know who's transaction to redirect to.

I currently use PHP's uniqid() function to create a somewhat unique identifier and use a unique constraint on the database when attempting to insert the store_id.

I adopted this function because I understand the potential limitations of using this function and recognise there won't be many of these identifiers being created in this system. If you're unsure, check the PHP manual to see if using uniqid() is suitable for your purposes.

Our dehydration strategy therefore involves creating a unique identifier that acts as the key to the corresponding value. Depending on how flexible your strategy needs to be will ultimately impact your design methodology.

In my particular use case, I have opted to store the tx_id and origin_tx_id in separate columns and dynamically build the redirect URL at runtime. I could just have easily stored the entire encoded URI as the value instead, but this would hamper any long term modifications i.e.: if part of the URL needs to change, I need to re-write the database columns if this was a simple key-value pairing.

So my table has a short_id, origin_tx_id, and tx_id along with a created_on timestamptz column. I always add a created_on timestamp column as I find it useful to know when a particular transaction was created. I tend to prefer an insert-only approach when interacting with databases vs. using an update strategy, mostly because it is faster to insert a record than update one, but also tables become self-audited. I can see the list of actions that were taken by simply looking at the inserts, and I use the created_on to access the latest item. And for warehousing, I can just select the entries by date and move them to the data warehouse.

From here, the process is pretty straightforward. When the user navigates to the short link URL, I do a database query based on the path parameter, retrieve the relevant information and dynamically build target URL.

I then use headers to redirect the user to the new location. I generally add some additional meta data to the URL based on the use case. For example, I'll add some get parameters: ?u=s for SMS, ?u=e for email, and ?u=q for QR Code. This allows me to determine how the user engaged with the platform. And because it maps back to the tx_id and origin_tx_id of the user, I also know which user performed the action.

Using a short link process gives us some obvious benefits:

It gives us the ability to reduce the length of a URL.

We're able to track the number of times an external resource is accessed via our shortened link.

It is easier to alter the location of the resource without impacting the user. This is probably one of the more useful benefits of using short links.


Creating PDF documents using wkhtmltopdf

October 17, 2022 Reading time: 2 minutes

I often use wkhtmltopdf to generate a PDF from web content with a common use case being the generation of invoices. More recently, I have also used it to generate data remediation documents to help businesses comply with some of their regulatory requirements.

Here's the link https://wkhtmltopdf.org/

The project consists of two utilities, one to generate PDF's and the other to generate images. wkhtmltopdf describes itself as:

... an open source (LGPLv3) command line tool to render HTML into PDF and various image formats using the Qt WebKit rendering engine. It runs entirely "headless" and does not require a display or display service.

However, on our Linux virtual machines sitting in Azure, I've always installed xvfb (X virtual framebuffer) to provide a virtual graphics buffer, perfect for those VM's that don't have a graphics adapter or screen.

It is easy enough to install from the command line using:

sudo apt install xvfb

Then, install wkhtmltopdf

sudo apt install wkhtmltopdf

In most cases, I access this via an exec command in most programming languages, although you should always be careful about what you are executing before actually running these commands!

// build the command that will run via PHP.
// remember to sanitise your inputs for $path and $filename!
$wkcmd = 'xvfb-run wkhtmltopdf "'.$path.'" '.$filename;
// execute the command
exec($wkcmd);
// In this instance, I'm using pdfbox to encrypt the output PDF with the user's ID number.
exec('java -jar pdfbox-app-2.0.21.jar Encrypt -U '.$pii['id_number'].' '.$filename);
// do what ever you need to do with the generated PDF i.e.: email it, store it.
// don't forget to clean up the file system afterwards!

Most of the code I have written uses PHP to generate and render the HTML in to a PDF in a temporary directory, and if necessary, makes use of PDFBox to apply additional transformations, like password protecting the PDF.


Setting up Cockroach DB

September 20, 2022 Reading time: 11 minutes

Cockroach DB is a scalable clustered database that Intergreatme makes use of for a significant amount of our data storage needs. While we still currently use PostreSQL for our mobile apps and Cassandra for document storage, the engineering team is also working towards replacing both with Cockroach.

I have also been using Cockroach DB as the go-to database for my own projects at Intergreatme. I make extensive use of it with the customer-facing portal I created as part of the KYC demo - which I ultimately re-worked to provide a front end for Telkom, Lalela and Angelshack to make use of as part of a delivery work-from-home validation program.

Note: I use this to setup a single instance node of Cockroach DB on Linux. A lot of what is mentioned here can also be used when setting up a cluster.

1. Create a Cockroach user

It's best to have Cockroach DB run as its own user. Use this command to setup the user. If you are following the Cockroach DB manual, remember that you will need to make use of sudo -u cockroach prior to running any additional configuration commands when interacting with Cockroach DB.

sudo adduser --system --disabled-password --disabled-login --no-create-home --shell /bin/false --group cockroach

2. Download Cockroach DB

use the following command to both download and extract Cockroach DB.

Note: you might need to install cURL first by running: sudo apt install curl

curl https://binaries.cockroachdb.com/cockroach-v22.1.7.linux-amd64.tgz | tar -xz && sudo cp -i cockroach-v22.1.7.linux-amd64/cockroach /usr/local/bin/

If you don't have or want to install cURL, you could replace it with wget but it becomes a two-step process:

wget https://binaries.cockroachdb.com/cockroach-v22.1.7.linux-amd64.tgz | tar -xz
sudo cp -i cockroach-v22.1.7.linux-amd64/cockroach /usr/local/bin/

3. Create your certificate files

Note: the Cockroach DB manual says its best to create these files on your local machine and to only copy the certificates that are necessary to the actual host machine. I would recommend following the prompts for a secure single-node instance as you would want any code you write on your local / dev environment to also be the same as in production.

Follow steps 1 to 4 but do not start the node. We will use our systemd service file to accomplish this.

Important! Since we created the cockroach user in step 1, you probably want to prepend all of the commands in the steps above using sudo -u cockroach command

This will ensure any working directories and files have the correct user applied to them. The assumption here is that you are going to be working from where you are going to be storing your files (based on the systemd configuration file outlined below) as in /var/lib/cockroach/

i. Create two directories for certificates.

sudo -u cockroach mkdir certs my-safe-directory

ii. Create the CA (Certificate Authority) certificate and key pair:

sudo -u cockroach cockroach cert create-ca --certs-dir=certs --ca-key=my-safe-directory/ca.key

iii. Create the certificate and key pair for the node:

sudo -u cockroach cockroach cert create-node localhost $(hostname) --certs-dir=certs --ca-key=my-safe-directory/ca.key

iv. Create a client certificate and key pair for the root user

sudo -u cockroach cockroach cert create-client root --certs-dir=certs --ca-key=my-safe-directory/ca.key

4. Create a cockroach.service file

I generally make use of systemd to provide a mechanism of controlling Cockroach DB. This allows me to more easily start/stop/restart and enable it to run as a service.

First, define the cockroach.service file. I tend to create this in my home directory first before copying / moving it to /etc/systemd/system/

[Unit]
Description=Cockroach Database cluster node
Requires=network.target
[Service]
Type=notify
WorkingDirectory=/var/lib/cockroach
ExecStart=/usr/local/bin/cockroach start-single-node --certs-dir=/var/lib/cockroach/certs --advertise-addr=localhost --cache=.25 --max-sql-memory=.25
TimeoutStopSec=60
Restart=always
RestartSec=10
StandardOutput=syslog
StandardError=syslog
SyslogIdentifier=cockroach
User=cockroach
[Install]
WantedBy=default.target

Update the file in a way that best fits your environment. For instance, you might want to store your working directory in a different location to where your certs are stored. You might also want to mount a different drive and have Cockroach persist data there to reduce IOP on the operating system disk, but also in case you ever need to detach or expand the drive (we use Azure to run our infrastructure.)

In certain environments, I also include the following to my cockroach.service file which delays the start up of Cockroach - particularly useful when the Azure agent is busy cleaning up swap memory at first boot when resource allocation can be quite high until everything has fully initialised.

ExecStartPre=/bin/sleep 60

Once you are happy with your file, move it to sudo mv cockroach.service etc/systemd/system/

Try and start the service to see if it will run: sudo systemctl start cockroach

It's also useful to remember to use sudo systemctl status cockroach and sudo journalctl -xe when things go wrong.

5. Create a new user

You should not be connecting to your database using the root user. Use the following steps to create a new user.

Start the built-in SQL shell

sudo -u cockroach cockroach sql --cert-dir=/var/lib/cockroach/certs

Create the user, any database(s) you want to work with, and grant permissions for the user. For instance, if you want your user to be db_user, you'd use:

CREATE USER IF NOT EXISTS db_user;
CREATE DATABASE mydb;
GRANT ALL ON DATABASE mydb TO db_user;

Then, you want to create a cert file for the user. Asssuming you are running the following command from /var/lib/cockroach/

sudo -u cockroach cockroach cert create-client db_user --certs-dir=certs --ca-key=my-safe-directory/ca.key

Final words

I use the process above as part of a fairly quick deployment mechanism when we create new instances that require standalone databases. The installation process is quick, and setting up an instance for single node access is pretty easy.

If you do encounter problems, these are usually to do with file permissions, which can either be on a directory, or with the access on a particular file.

You can use the following to recursively change the folder owner

sudo chown -R cockroach: . 

Or make use of the following to change the access permissions:

sudo chmod 600 node.key
sudo chmod 600 client.node.key


Upcoming changes to Caelisco

September 13, 2022 Reading time: ~1 minute

If you are an old visitor to caelisco.net you might notice that i have removed the old (and fairly generic) landing page to focus purely on content. The departure is through an analysis of entry/exit points through my Matomo tracking: the majority of people entering the site is through searches for particular terms. Social media also plays a part, given this is where I will sometimes link content to drive traffic.

The original blog posts are currently available at caelisco.net/blog/ but the plan is to perma-delete that part of the website. The current objective is to review existing content, determine whether it is still valid (read: useful) and to establish what needs to be updated in the post.

As an example, the content on setting up a single instance node of Cockroach DB is a popular post, but it needs some tender loving care and also I need to remove some of the problems that I introduced in the original post. Plus, there is a lot that's changed from when I wrote that post with Cockroach going from v20.1.7 through to v22.1.6!

The expected changes won't be instantaneous, so there will be some delay between moving between these. I have already configured the web server to redirect unknown paths on the caelisco.net domain to the home page.


A new start

September 1, 2022 Reading time: 5 minutes

Hi there, and welcome to my little corner of the Internet which I call Caelisco.

My name is James Lawson, and I'll be your host while you're here. I would describe myself as a techie, I have a passion for technology, but similarly am really interested in business (yes, I have a subscription to Harvard Business Review and regularly read articles from there).

Other interests include travel, cooking, and photography. I am also an avid gamer, with my current favorite games being Oxygen Not Included (1000+ hours sunk in), Starbound, and Avorion. Another favourite is Valheim, as it provides me with a space to chat with my friends around the world, while working collaboratively within the build / fight / explore style of gameplay. I'm also a fan of Elite Dangerous, and The Forest. Somehow I've acquired 246 games on Steam.

My current objective(s) with the website are:

I am trying to reinvigorate the domain by reorienting the content and pages, while also *trying* to write up some content. I'm not going be setting myself a goal of weekly writing, but I would like to aim for a once-a-month dialog.

The second objective is to promote myself a bit more and build up some brand equity. I'm interested in helping other entrepreneurs, start-ups or SME businesses with tech-focused advisory work.

Throughout my career, which has been varied to say the least, I've picked up quite a few skills that are really at the intersect of Business and Information Technology.

I've worked in everything from being a teller and enquiries clerk at Absa bank to writing as a tech journalist, to technology facilitation/manager and lastly CTO/CIO/Interim CEO at a company I co-founded in 2016.

My particular focus areas are in ideation, innovation, process re-engineering, and taking conceptual ideas and building on them. I find having a fairly strong logic system and a fairly good imagination helps me construct abstract ideas in to something more concrete.

My current role is Interim Chief Executive Officer for Intergreatme, a company I helped co-found in 2016. I started as the Chief Technology Officer, then prior to us working with some of the big brands in South Africa shifted gears to become the Chief Information Officer. I took over the role of Interim CEO at the end of 2021.

Intergreatme has been a great learning experience, and it has also been rewarding in the sense that it is an award-winning business. There are still many things to build up on.

When it comes to coding, I am a bit of a polygot, though I've come to prefer writing code in either Go or PHP. One of the main reasons is that my expertise domain has shifted, and so when I do write code, it is usually to prototype an idea or concept. I find it easier it is to explain or show a concept to either stakeholders or your dev team that has been quickly prototyped. My local dev folder is littered with code, and some of these proof of concept programs are still running in production today. Oops!

Why Caelisco?

Caelisco is a name I made up from the mashing of two Latin words together, one being Web and the other Universe. At the time, I thought it apt, because the Internet is a universe that is linked together much like the web from a spider. I can't quite recall which two words in particular I used, but it was when I was living in London (2005 - 2007)