In this article, we will understand In-memory or Real-time database with below table of content -
1.What is Real-Time or In-Memory Database
2.How Real-Time Database works
3.Durability in Real-Time Database
4.What are the Performance Improvement techniques
5. How to handle, When Real-Time Database doesn’t fit in memory System RAM
6.Comparison: Real-Time Vs. Relational Database
7.Use Case of Real-Time Database
What is Real-Time or In-memory Database?
(see below image for in-memory database design)
Let’s have a look at below comparison report for data fetching performance between data storing into System hard disk, SSD, and RAM. It should give you a clear picture of data fetching speed for storing data into Real-time or In-memory database data structure.
(see below image for data fetching speed between System Hard Disk, SSD, and RAM)
(Image 2 - Data fetching speed in Real-time database)
As you see in the above image, data fetching speed will be increased by 70 times if data is stored into the System RAM, and this is a huge difference. As an example, we can say that if any query is taking 70 times for data fetching from hard disk then it will take only 1 sec to fetch the same data from System RAM.
Real-time/In-memory database examples
- AEROSPIKE
- SAP
- VOLT DB
- REDIS LABS
- SQLite
- Xeround Inc.
- Boiler Bay Inc
- HSQL Development Group
- MC Object
- ArangoDB GmbH
- Microsoft SSAS Tabular
Based on your case and business requirement you can choose the best in-memory database from the above list.
How Real-Time Database works?
As Real-time or In-memory database definition, data is stored into the system’s main memory or RAM, and a WAL file is stored into the system’s hard disk for all the DML or DDL operations performed on the system’s main memory.
In a Real-time database, Data is arranged into System RAM using AVL tree, and as we know about the AVL tree that it always stores data with Pointers rather than references and max data nodes height can be 1, and that‘s the reason the Query engine returns the result back to the end-user in very less time.
To understand this better, let’s suppose if any user fires a query in a Real-time database platform, then the query engine doesn’t require to and scan the system hard disk for results because data exists into System main memory or RAM in this case and Query engines can directly fetch data from system RAM and can save lots of scanning time to searching the results as compared to hard disk.
Durability in Real-Time Database
You might have a question that how Real-Time Database provides durability to the stored data in system RAM because as we know System RAM is a non-volatile memory and how it can save data after each time the system restarts or system crashes?
Well, to understand this better, let’s first understand about WAL file.
WAL file is called ‘Write Ahead Log’ which exists into the System hard disk and contains logs for all DML and DDL operations performed over the data. In case if any record is updated multiple times then there will be multiple entries for that into the WAL file.
So when the system restarted or crashed then Real-time or In-memory database engine executes each & every entry that exists into the WAL file and in this way, data will be repopulated back to the System RAM, that’s how durability is achieved in a Real-time database.
When a Real-time database engine is processing WAL file execution for repopulating data back to the system RAM or main memory then it might take some time depending on the size of the WAL file.
What are the Performance Improvement techniques in a Real-time database?
Usually, performance improvement is not required in a Real-time database because it already arranges data in pointers with the AVL tree implementation, but there might be some other painful scenario that indirectly can affect the data fetching speed or query performance.
Let’s suppose the size of the WAL file has increased and it reaches the threshold or almost equal to the size of the hard disk then in this situation your RAM performance might get down, so we will understand few common performance improvement techniques to overcome from these kinds of the issue -
Comp Action
Delete Logs older than Snapshot period
We can create a database snapshot and clear the entire WAL file log entries to save hard disk space.
To understand this better, let’s support we created a Real-time database snapshot at the time interval ‘T’ and clear all the log entries from the WAL file which had created earlier to this ‘T’ time interval, and whenever required, this snapshot can be utilized to repopulate the data rather than WAL file.
Reducing the data size
We can work on the below simple suggestion -
- Read-only required columns, don’t use * in your select queries
- Avoid large text columns such as description
- Use single char rather than full text, such as gender column, Male will be M, Female will be F
- Keep older/historical or less used data into hard disk rather than RAM
How to handle, When Real-Time Database doesn’t fit in System RAM?
There might be one situation when the current data size is increasing drastically and very soon it will be more than the available system RAM or main memory, at that time, the Real-time database will not work and fail due to out of memory exception.
In a general study, it is found that a RAM size can be set up up to 400 GB, but what to do when your Real-time database size is more data than the system RAM. We will talk about few possible solutions as below -
Money Costing Solution
- Add more RAM and robust infrastructure
- Rent a Cloud or Virtual memory
Time Costing Solution
- Switch to Hard Disk
- Eviction Algorithm of Data from Main Memory to Disk
If you can afford slow performance and time is not any constraint then you can think about this Time costing solution.
Compression
- Lossless compression
In lossless compression, there is no change in the outcome, and the meaning of data is going to be the same between prior and after this compression. There are below tips for Lossless compression
- Load only required columns
- Manipulate data types
- Avoid large text columns such as description
- Column Sparse (Avoid loading NULL/NAN rows)
- Use single char rather than full text, such as gender column, Male will be M, Female will be F
- Lossy compression
- Load only required columns
- Modifying Integer data type
- Select only Top N or Bottom N Rows
Lossy compression might be causing potential loss of data or compromise in the data quality or accuracy. As per the best practice, you should perform an end-to-end impact analysis and confirm if your business users are fine with all those impacts.
However, I don’t recommend Lossy compression as a performance improvement technique because it impacts the existing data and outcome and most of the time your business users won’t like that.
Comparison: Real-Time Vs. Relational Database?
You might have another question, whether a Real-time or relational database should be used in your project? let’s see below Real-time or In Memory database comparison with Relational database on certain parameters, it should help answer your query better -
In the above comparison table between Real-time vs. Relational database, you can see both Real-Time and Relational databases have their benefits, but major in-memory database benefits are Performance, Data arrangements using pointers, and ease of data store without serializations.
you can choose between a Real or Relational database for your project wisely by keeping the above parameters in your mind.
The major concerning area with Real-time or In-memory database is the Cost, data durability, and persistency or you can say major Real-time or in-memory database disadvantages.
7. Use Case of Real-Time Database?
This is the last topic in this article and here we will discuss one practical use case in which only a Real-time or in-memory database can be implemented, and any other relational database might not work up to the expected performance.
In the use case, we are going to talk about the Google Search engine and the Display of the right advertisement to the user’s browser. So how it works?
The answer is when a user search for a product in google using internet then google create and stored these cookies into user local hard disk and when the same user next time browse the internet then google refer these cookies and display similar advertisement, and in this way, everyone’s will get benefits out of it, for example - User would get desired product, Seller would sell that product and Google will also get some commission.
(Image 3 - Real-time database use-case)
Google engine use to pass these cookies into ‘Real-Time Bidding Exchange’ system as called RTBE. RTBE has two sides, one side has billions of advertisements and on another side, there are billions of users available. RTBE’s role is to select the right advertisement and pass it to the right user so that users can get more options to buy their desired product.
Let’s suppose any user is searching for a laptop with some particular specifications, based on this user search activity, google engine will create and store cookies for this into user local system hard disk and when the same user next time do the browsing or read any content over the internet then RTBE will show a related advertisement to the user’s browser window.
For an understanding purpose let’s think in this way, user browser total takes 250 milliseconds to render and appear to the user’s browser window, 250 millisecond is a very small time frame and it might be possible that out of 250, 200 milliseconds is the minimum required for user’s browser rendering, so if we see, RTBE system has only 50 milliseconds to perform the entire operation end to end for utilizing the user’s stored cookies, then selecting the right advertisement and pass it to the right user.
To perform this operation in such a short time frame(50 MS), it would be required a high-performance database system and a real-time database would be the right choice rather than any other Database system.
Conclusion
I Hope In-memory or Real-time database explained in this article provides good information, I always try to provide all end-to-end information about any topic under one post only so my readers don’t need to look into other websites for the rest of the information. You can also check the video explanation of this article in the below link.
Feel free to share your suggestions using the comment box.
If you get a good learning on this topic then please like and subscribe to my blog for more such topics.
Thanks.
0 Comments