What are a Real-Time or In-Memory database and data persistency with in-system RAM with real use case?

Nowadays lots of applications(Twitter, Facebook, Google search, etc….)  are running which perform millions of transactions in few seconds and adding a large volume of data into the system memory, when people further wants to utilize this data for their analytics then it might be really a big challenge to get the faster performance to save time and energy.

In this article, we will understand In-memory or Real-time database with below table of content -


Table of content
1.What is Real-Time or In-Memory Database
2.How Real-Time Database works
3.Durability in Real-Time Database
4.What are the Performance Improvement techniques
5. How to handle, When Real-Time Database doesn’t fit in memory System RAM
6.Comparison: Real-Time Vs. Relational Database
7.Use Case of Real-Time Database 
  1. What is Real-Time or In-memory Database?

In a Real-time database, data exists in the system main memory or RAM, and a separate WAL file store into System hard disk for all DML & DDL operation performed earlier, This WAL file is used by a real-time database engine to repopulate data back to the system RAM or Main memory at the time of system restart or crash. The real-time database provides faster performance as compare to any other database engine because its data exist in the system’s main memory. 

(see below image for in-memory database design) 

Real-time database processing

(Image 1 - Real-time database processing)

Let’s have a look at below comparison report for data fetching performance between data storing into System hard disk, SSD, and RAM. It should give you a clear picture of data fetching speed for storing data into Real-time or In-memory database data structure.


(see below image for data fetching speed between System Hard Disk, SSD, and RAM) 

Data fetching speed in Real-time database

(Image 2 - Data fetching speed in Real-time database)


As you see in the above image, data fetching speed will be increased by 70 times if data is stored into the System RAM, and this is a  huge difference. As an example, we can say that if any query is taking 70 times for data fetching from hard disk then it will take only 1 sec to fetch the same data from System  RAM.

Real-time/In-memory database examples 

  • AEROSPIKE
  • SAP
  • VOLT DB
  • REDIS LABS
  • SQLite
  • Xeround Inc.
  • Boiler Bay Inc
  • HSQL Development Group
  • MC Object
  • ArangoDB GmbH
  • Microsoft SSAS Tabular

Based on your case and business requirement you can choose the best in-memory database from the above list.

  1. How Real-Time Database works?

As Real-time or In-memory database definition, data is stored into the system’s main memory or RAM, and a WAL file is stored into the system’s hard disk for all the DML or DDL operations performed on the system’s main memory. 


In a Real-time database, Data is arranged into System RAM using AVL tree, and as we know about the AVL tree that it always stores data with Pointers rather than references and max data nodes height can be 1, and that‘s the reason the Query engine returns the result back to the end-user in very less time.


To understand this better, let’s suppose if any user fires a query in a Real-time database platform, then the query engine doesn’t require to and scan the system hard disk for results because data exists into System main memory or RAM in this case and Query engines can directly fetch data from system RAM and can save lots of scanning time to searching the results as compared to hard disk.

  1. Durability in Real-Time Database

You might have a question that how Real-Time Database provides durability to the stored data in system RAM because as we know System RAM is a non-volatile memory and how it can save data after each time the system restarts or system crashes?


Well, to understand this better, let’s first understand about WAL file. 


WAL file is called ‘Write Ahead Log’ which exists into the System hard disk and contains logs for all DML and DDL operations performed over the data. In case if any record is updated multiple times then there will be multiple entries for that into the WAL file.


So when the system restarted or crashed then Real-time or In-memory database engine executes each & every entry that exists into the WAL file and in this way, data will be repopulated back to the System RAM, that’s how durability is achieved in a Real-time database.


When a Real-time database engine is processing WAL file execution for repopulating data back to the system RAM or main memory then it might take some time depending on the size of the WAL file.

  1. What are the Performance Improvement techniques in a Real-time database?

Usually, performance improvement is not required in a Real-time database because it already arranges data in pointers with the AVL tree implementation, but there might be some other painful scenario that indirectly can affect the data fetching speed or query performance.


Let’s suppose the size of the WAL file has increased and it reaches the threshold or almost equal to the size of the hard disk then in this situation your RAM performance might get down, so we will understand few common performance improvement techniques to overcome from these kinds of the issue -

  1. Comp Action

To understand this better, let’s suppose when we execute the entire WAL file then all the older entries will also execute, and later they will be overridden by the latest DML & DDL log entries, so there is no use to keep old log entries into the WAL file.

In Comp action activity, we clear older log entries from the WAL file and keep only the latest sets of logs to save available System hard disk space, because the execution of the latest set of entries will repopulate the same data back to the RAM.
  1. Delete Logs older than Snapshot period

We can create a database snapshot and clear the entire WAL file log entries to save hard disk space.


To understand this better, let’s support we created a Real-time database snapshot at the time interval ‘T’ and clear all the log entries from the WAL file which had created earlier to this ‘T’ time interval, and whenever required, this snapshot can be utilized to repopulate the data rather than WAL file.

  1. Reducing the data size 

We can work on the below simple suggestion -

    • Read-only required columns, don’t use * in your select queries
    • Avoid large text columns such  as description
    • Use single char rather than full text, such as gender column, Male will be M, Female will be F
    • Keep older/historical or less used data into hard disk rather than RAM
  1. How to handle, When Real-Time Database doesn’t fit in System RAM?

There might be one situation when the current data size is increasing drastically and very soon it will be more than the available system RAM or main memory, at that time, the Real-time database will not work and fail due to out of memory exception.


In a general study, it is found that a RAM size can be set up up to 400 GB, but what to do when your Real-time database size is more data than the system RAM. We will talk about few possible solutions as below -

  1. Money Costing Solution 

      • Add more RAM and robust infrastructure
      • Rent a Cloud or Virtual memory
                If the budget allows, then you can switch to better and robust infrastructure.
  1. Time Costing Solution 

      • Switch to Hard Disk
      • Eviction Algorithm of Data from Main Memory to Disk

        If you can afford slow performance and time is not any constraint then you can think about this            Time costing solution.

  1. Compression

For the information, please do not confuse with the word compression as it we are not going to create any kind of compressed or zip file. In compression, we will play with data with simple techniques. Mainly there are two types of compression as below-
    • Lossless compression 

In lossless compression,  there is no change in the outcome, and the meaning of data is going to be the same between prior and after this compression. There are below tips for Lossless compression 

      • Load only required columns
      • Manipulate data types
      • Avoid large text columns such as description
      • Column Sparse (Avoid loading NULL/NAN rows)
      • Use single char rather than full text, such as gender column, Male will be M, Female will be F
    • Lossy compression
      • Load only required columns
      • Modifying Integer data type
      • Select only Top N or Bottom N Rows

Lossy compression might be causing potential loss of data or compromise in the data quality or accuracy. As per the best practice, you should perform an end-to-end impact analysis and confirm if your business users are fine with all those impacts.


However, I don’t recommend Lossy compression as a performance improvement technique because it impacts the existing data and outcome and most of the time your business users won’t like that.

  1. Comparison: Real-Time Vs. Relational Database?

You might have another question, whether a Real-time or relational database should be used in your project? let’s see below Real-time or In Memory database comparison with Relational database on certain parameters, it should help answer your query better -


Sr.No.

Real-Time Database

Relational Database

Win

1

High Cost

Low Cost

Relational

2

Volatile  *NVRAM

Durable

Relational

3

Crash Recovery using WAL

Not required

Relational

4

High performance

Low Performance

Real-Time

5

AVL(Range queries)

B+ Tree

Real-Time

6

No Serialization required(Pointers)

Serialization required

Real-Time


In the above comparison table between Real-time vs. Relational database, you can see both Real-Time and Relational databases have their benefits, but major in-memory database benefits are Performance, Data arrangements using pointers, and ease of data store without serializations.

you can choose between a Real or Relational database for your project wisely by keeping the above parameters in your mind. 


The major concerning area with Real-time or In-memory database is the Cost, data durability, and persistency or you can say major Real-time or in-memory database disadvantages.


7. Use Case of Real-Time Database?

This is the last topic in this article and here we will discuss one practical use case in which only a Real-time or in-memory database can be implemented, and any other relational database might not work up to the expected performance.


In the use case, we are going to talk about the Google Search engine and the Display of the right advertisement to the user’s browser. So how it works? 

The answer is when a user search for a product in google using internet then google create and stored these cookies into user local hard disk and when the same user next time browse the internet then google refer these cookies and display similar advertisement, and in this way, everyone’s will get benefits out of it, for example - User would get desired product, Seller would sell that product and Google will also get some commission.


Let’s have a look at the below image for detailed discussion -

Real-time database use-case

(Image 3 - Real-time database use-case)


Google engine use to pass these cookies into ‘Real-Time Bidding Exchange’ system as called RTBE. RTBE has two sides, one side has billions of advertisements and on another side, there are billions of users available. RTBE’s role is to select the right advertisement and pass it to the right user so that users can get more options to buy their desired product.


Let’s suppose any user is searching for a laptop with some particular specifications, based on this user search activity, google engine will create and store cookies for this into user local system hard disk and when the same user next time do the browsing or read any content over the internet then RTBE will show a related advertisement to the user’s browser window.


For an understanding purpose let’s think in this way, user browser total takes 250 milliseconds to render and appear to the user’s browser window, 250 millisecond is a very small time frame and it might be possible that out of 250, 200 milliseconds is the minimum required for user’s browser rendering, so if we see, RTBE system has only 50 milliseconds to perform the entire operation end to end for utilizing the user’s stored cookies, then selecting the right advertisement and pass it to the right user. 

To perform this operation in such a short time frame(50 MS), it would be required a high-performance database system and a real-time database would be the right choice rather than any other Database system.


Conclusion

I Hope In-memory or Real-time database explained in this article provides good information, I always try to provide all end-to-end information about any topic under one post only so my readers don’t need to look into other websites for the rest of the information. You can also check the video explanation of this article in the below link.


Video link of this Article


Feel free to share your suggestions using the comment box.


If you get a good learning on this topic then please like and subscribe to my blog for more such topics.


Thanks.

Post a Comment

0 Comments