The Architecture of Object Storage System

The Architecture of Object Storage System

This Architecture is handcrafted by instaDrop team, and it copy what we need from different architecture of other exsiting systems

Why Object Storage Systems are different? the answer is not clear but for other systems, they are ensuring the durability of data through distributed systems.

But in our design, our focus is the speed of reading over writing, and you may ask why we are not doing a file system. exposing the file system to users is a bad idea, and also reading a file directly from your server is a very bad idea, but it works in the end and it serves the purpose of storing the file, and you are relying on what the system offer to you as if you are using UNIX system.

File systems are bad when dealing with large amounts of files, and the hierarchy of these systems is not fast enough.

and these issues are introduced by the file system, where Object Storage systems come to fix them, by separating the storage, entities, and even the authority.

In all cases, writing takes too much, but to ensure that we have better reads, the system is mostly built for write once read many.

How it looks

We have three services, regarding the Auth service will discuss it later.

metadata service

This is where all of the info about the object is stored, including metadata, ACL policy, the ID of the object's location, and the bucket that the object is stored within.

we use that to have better access to the object location within the file system itself without doing a lookup in the filesystem, where we ask for the object UUID with its key ( aka filename ).

Date Store service

This is a regular file system, where we store the files, but why is it different?

we store files in bytes within a bigger file, you may ask why I would like to answer but wait.

basically, there's a big difference between a file and an object, in this system, a file is the place where we store objects ( aka files have been stored ).

We use the start_offset, object_size to determine its location within this file.
the file is usually set to a few GBs, and this way we have our own file system architecture that ensures a fast read, and also the reason behind this is to avoid wasting disk blocks.

disk block consists of 4kb, and when it used by small file less than 4kb this space is wasted because it take the whole block.

we store the filename in the metadata store to know where the object is stored

and we use a WAL ( write-ahead-log ) to know which exactly the file is in the read-write state.

The Routing ( aka API service )

This is where all routing happens starting from the Uploading ( http put ) object, deleting, creating a bucket, and downloading a bucket.

There are two things to consider here, data corruption is important to be handled by checksum, SHA family, MD5.

and the multi-upload in case a bigger file is being transferred, why?

The service relies on HTTP and HTTP has limits to its body size, so we need to make sure when the client hits another request to upload the remaining of object, the system should not consider it as a new object and should fulfill the same object, within the file stored in.

and this will be discussed in the next Article.

Please if you have any inquiries or corrections just message me.

what is write ahead log
Amazon s3 whitepaper