Yingshaoxo's database design

Initiate

First, create a database.txt file.

Then, for each line, it is a json string. That json string is actually a dict.

Add

When you add a record, you append a new json string at the bottom of that database.txt file.

Search

When you search for a record, you iterate all those json string.

For each line of json string, you use a python function to handle it.

If a record is not what you want, you return None, if a record is what you want, you return a dict result.

In the end, you’ll ignore those None result, and get a list of dict object.

>>> import subprocess
>>> pycode = """
... import sys
... if sys.argv[1] == 'nice_guy':
...     print('yingshaoxo')
... else:
...     print('unrecognized arg')
... """

>>> result = subprocess.run(['python', '-c', pycode, 'nice_guy'], stdout=subprocess.PIPE)
>>> print(result.stdout.decode())
yingshaoxo

Delete

I do not recommend do the real deletion for a line of json string when other process is reading that file.

I would recommend you to use some special tech to do it:

Add ‘#’ symbol at the begnning of a line to indicate it is a deleted line.

You can do it by replace the first character of a line with ‘#’ symbol

Edit

Simply delete the old line of record, and add a new one at the bottom of that database.txt file.

Another way is to use bytes seek tech, but it is limitations, it can only be used on limit size record data, in other words, every row should have same length:

def delete(self, one_row_dict_filter: Callable[[dict[str, Any]], bool]):
    """
    one_row_dict_filter: a_function to handle deletion process. If it returns False, we'll ignore it, otherwise, if it is True, we'll delete that row of data.
    """
    with open(self.database_txt_file_path, "r+") as file_stream:
        end_detection_counting = 1
        old_position_pair = None
        while True:
            previous_position = file_stream.tell()
            line = file_stream.readline()
            current_position = file_stream.tell()

            new_position_pair = (previous_position, current_position)
            #print(new_position_pair)
            if old_position_pair == new_position_pair:
                end_detection_counting += 1 
            else:
                old_position_pair = new_position_pair
            if end_detection_counting >= 3:
                # We could make sure it is the end of the file
                old_position_pair = None
                break

            if (line.strip() == ""):
                # ignore empty line
                continue

            json_dict = self._json.loads(line)
            result = one_row_dict_filter(json_dict)
            #print(result)
            if (result == True):
                # replace the old line into space
                file_stream.seek(previous_position)
                file_stream.write((" "*(len(line)-1)) + "\n")

For multi_process usage 

You can write a queue to handle those request. Which can make sure your database is safe under multi_process usage.

For extreme long and big string situation

We can set variable limit = 255

If any fieid in a dict has value length > 255, then we add ‘{“-use_additional_space”: true}’ and ‘{“-additional_json_path”: “./random_hash.json”}’ to that record.

Then we put the original json record into that additional_json file without deleting any big value.

But for the record in ‘database.txt’, we simply set all field value that > 255 to null. (It is to speed up the search speed)

And when we do searching, or any other operations, we simply load that additional_json file into memory, then everything works as usual.