SQLite3

SQLite FTS5 performance on 2012 HDD-only server

I'm developing an app that uses the FTS5 module in SQLite for search functionality and it just so happens that I want to know in advance what I can expect from it. Unfortunately, there are hardly any articles on the web that explicitly test FTS5 INSERT/SELECT queries speed — besides LLM-generated articles full of nonsense, but with good SEO — so this blog entry is all about that.

Hardware

Dedicated server straight from 2012, running on Debian 13.1:

Why use HDD-only system from 2012 in 2025? That's a good question, moving on. 1

Database

I wanted to test FTS5 engine with relatively small amount of data, so I took hackernews dataset (all posts from HN til 2024) end inserted it into one table and the numbers are:

What's the Unicode61, Porter and Trigram?

Those are tokenizers that are available by default with SQLite3 FTS5 engine.

Porter itself is a wrapper around Unicode61 and also allows you search terms like "correction" to match similar words such as "corrected" or "correcting". And Trigram extends FTS5 to support substring matching in general, instead of the usual token matching, which is good for fuzzy search (and hence triple the size of the database).

It worth saying that default tokenizers are optimized mostly for English – good luck with searching anything that requires word segmentation support. If you want to support CJK, then use something like better-trigram. Testing this is out of scope of this blog entry.

SELECT queries performance with Unicode61 tokenizer

Let's start with realistic scenario and then we will bench it for real.

I want to search for 'test' everywhere: in the user's nickname, the text, the URL and the title. Also, I don't want to limit myself with pity LIMIT statements. Let's see:

SELECT * FROM hn('test');

Just shy of 385345 results — a pretty good amount to display to a user on a single page. How long did that take?

real    5m51.913s

Come to think of it maybe displaying 385345 results is suboptimal. Let's add LIMIT after all:

SELECT * FROM hn('test') LIMIT 50;

real    0m0.027s
user    0m0.019s
sys     0m0.008s

Much better! Let's also try a little bit more complicated queries

-- This equals to "Airbnb AND ruins AND everything"
SELECT * FROM hn('Airbnb ruins everything');

4 results
real    0m0.040s
user    0m0.024s
sys     0m0.016s
-- phrase "crypto scam" must appear at start of a column
SELECT * FROM hn('^crypto + scam')

13 results
real    0m0.114s
user    0m0.028s
sys     0m0.008s

...you know what? That's pretty good for my case. But to find more results, we can utilize Porter and Trigram.

SELECT comparison between Unicode61, Porter and Trigram

Let's do same queries as before and also record amount of results for each tokenizer. Those queries are being run once without prior runs, so no caching is involved.

SELECT * FROM db('query');
Tokenizer Results amount Query duration (s)
Unicode61 97774 results 212.9179s
Porter 154188 results 244.6411s
Trigram 161511 results 333.1113s
SELECT * FROM db('correction');
Tokenizer Results amount Query duration (s)
Unicode61 33067 results 112.4054s
Porter 477944 results 422.8689s
Trigram 46222 results 203.8271s
SELECT * FROM db('c pthreasd'); -- "c AND pthreasd", typo on purpose :)
Tokenizer Results amount Query duration (s)
Unicode61 0 results 0.3662s
Porter 0 results 0.4749s
Trigram 0 results 31.0733s
SELECT * FROM db('"c pthreads"');
Tokenizer Results amount Query duration (s)
Unicode61 11 results 0.6669s
Porter 11 results 0.8576s
Trigram 10 results 15.5765s
SELECT * FROM db('^somewhat longer text in here');
Tokenizer Results amount Query duration (s)
Unicode61 1 result 15.0231s
Porter 1 result 16.0596s
Trigram 4 results 100.0801s
SELECT * FROM db('NEAR(rust python, 5) performance*');
Tokenizer Results amount Query duration (s)
Unicode61 985 results 5.6444s
Porter 846 results 5.8575s
Trigram 160 results 74.0494s

We can clearly see that Trigram struggles, but sometimes returns the most results. What if we make some queries that utilize unique feature about Trigram – that it can be queried by a sequence of characters, not only by complete tokens?

SELECT * FROM hn WHERE text LIKE '%dawg%';
Tokenizer Results amount Query duration (s)
Unicode61 781 results 470.1948s
Porter 781 results 381.8981s
Trigram 781 results 3.2189s
SELECT * FROM hn WHERE title GLOB '*suit*';
Tokenizer Results amount Query duration (s)
Unicode61 9248 results 562.7608s
Porter 9248 results 538.4104s
Trigram 9248 results 38.7541s

So while Trigram is 3 times the size of Unicode61 database and quite a bit slower in cases when matching rows by tokens it is much faster in LIKE and GLOB queries...

SELECT * FROM hn WHERE title GLOB '*[^1-9]';
Tokenizer Results amount Query duration (s)
Unicode61 4855683 results 584.2672s
Porter 4855683 results 582.6868s
Trigram 4855683 results 1410.1309s

But not with regex – that's linear scan for every party involved.

Benchmarkin'

Created using little python script that tries various PRAGMA combinations to check changes in INSERT speed as well as SELECT performance.

There's also quite a few variables:

Run settings:

Runs: 3
Dataset limit: 100000

We run every benchmark 3 times, 100K rows in the database (around 5MB for unicode61, 15MB for trigram).

Query patterns:

QUERY_PATTERNS = {
    "simple_and": ["easy simple", "database performance optimization", "for some reason really long and"],
    "phrase_search": ['"machine learning"', 'ai + shit', '"data science"'],
    "prefix_search": ["python*", "data*", "web*"],
    "initial_token": ["^python", "^data", "^web"],
    "near_operator": ["NEAR(rust sqlite, 5)", "NEAR(react htmx, 3)", "NEAR(ml shit, 2)"],
    "column_filter": ["title: python", "text: database", "by: dawg"], "prefix_and_initial_token": ["^python*"],
    "complex": ["NEAR(rust python, 5) performance*", "^performance NEAR(rust python, 3)"]
}

Those are used to calculate SELECT timing median.

Sorting

Tables are sorted by Insert Time (s) – this is how much time was required to insert the whole dataset.

Maximum Performance Combinations

FTS5 table config: {'prefix': '2 3'}
FTS5 runtime config: {'pgsz': 60000, 'automerge': 16, 'crisismerge': 8}
Tokenizer used: porter
Transcation strategy: batch_500_commit_5000

Config Insert Time (s) Insert rows/s (Mdn/P99/P95) StdDev Query Median (ms) Safety
{'mmap_size': 28, 'page_size': 16384, 'synchronous': 'OFF', 'journal_mode': 'OFF', 'temp_store': 'FILE', 'threads': 1, 'cache_size': -8192, 'secure_delete': 'OFF', 'locking_mode': 'NORMAL'} 19.63 5094/5105/5104 ±0.05 1.21 UNSAFE
{'mmap_size': 28, 'page_size': 16384, 'synchronous': 'NORMAL', 'journal_mode': 'WAL', 'temp_store': 'FILE', 'threads': 1, 'cache_size': -8192, 'secure_delete': 'OFF', 'locking_mode': 'NORMAL'} 22.14 4517/4525/4525 ±0.06 1.55 SAFE
{'synchronous': 'EXTRA', 'cell_size_check': 'ON', 'foreign_keys': 'ON', 'mmap_size': 0, 'journal_mode': 'WAL', 'page_size': 16384, 'temp_store': 'FILE', 'cache_size': -8192, 'threads': 1} 24.61 4063/4070/4070 ±0.08 1.31 EXTRA SAFE

FTS5 Tokenizer Performance

Tokenizer used: unicode61
Transcation strategy: batch_1000_commit_1000

Tokenizer Insert Time (s) Insert rows/s (Mdn/P99/P95) StdDev File Size (KB)
porter 35.57 2811/2834/2832 ±0.44 55664
unicode61 remove_diacritics 2 35.70 2801/2818/2816 ±0.31 56356
porter ascii 35.96 2781/2838/2833 ±0.68 55664
unicode61 remove_diacritics 1 36.27 2757/2862/2853 ±0.79 56356
ascii 36.45 2743/2762/2760 ±0.34 56352
unicode61 37.01 2702/2707/2707 ±0.41 56356
unicode61 remove_diacritics 0 37.47 2669/2719/2715 ±0.52 56356
trigram 58.97 1696/1741/1737 ±1.26 155904

Individual PRAGMA Performance

Synchronous

Tokenizer used: unicode61
Transcation strategy: batch_1000_commit_1000

Value Insert Time (s) Insert rows/s (Mdn/P99/P95) StdDev Query Median (ms)
OFF 20.35 4915/4958/4954 ±0.11 1.03
NORMAL 35.39 2826/2855/2853 ±0.32 1.35
FULL 35.44 2822/2886/2881 ±0.67 1.33
EXTRA 37.20 2688/2688/2688 ±0.28 1.31

Temp Store

Tokenizer used: unicode61
Transcation strategy: batch_1000_commit_1000

Value Insert Time (s) Insert rows/s (Mdn/P99/P95) StdDev Query Median (ms)
FILE 35.32 2831/2832/2832 ±0.29 1.03
MEMORY 35.71 2801/2804/2804 ±0.03 1.08

Cache Size

Tokenizer used: unicode61
Transcation strategy: batch_1000_commit_1000

Value Insert Time (s) Insert rows/s (Mdn/P99/P95) StdDev Query Median (ms)
-8192 33.36 2997/3010/3009 ±0.46 1.26
-524288 33.91 2949/3039/3032 ±0.94 1.15
-65536 34.12 2931/2966/2963 ±0.49 1.15
-16384 34.27 2918/2938/2936 ±0.69 1.00
-2000 35.79 2794/2807/2806 ±0.25 1.06

Journal Mode

Tokenizer used: unicode61
Transcation strategy: batch_1000_commit_1000

Value Insert Time (s) Insert rows/s (Mdn/P99/P95) StdDev Query Median (ms)
WAL 27.54 3631/3636/3636 ±0.47 1.38
OFF 28.28 3536/3570/3567 ±0.29 1.27
MEMORY 28.39 3522/3538/3537 ±0.19 0.97
DELETE 36.03 2776/2817/2814 ±0.44 1.31
TRUNCATE 37.68 2654/2666/2665 ±0.24 1.09
PERSIST 39.81 2512/2523/2522 ±0.20 1.34

Page Size

Tokenizer used: unicode61
Transcation strategy: batch_1000_commit_1000

Value Insert Time (s) Insert rows/s (Mdn/P99/P95) StdDev Query Median (ms)
16384 31.46 3179/3199/3198 ±0.73 1.57
32768 32.12 3113/3120/3120 ±0.17 1.44
8192 33.94 2946/2982/2979 ±0.26 1.21
4096 35.66 2804/2822/2820 ±0.22 1.07
1024 43.03 2324/2325/2325 ±0.36 1.42

Mmap Size

Tokenizer used: unicode61
Transcation strategy: batch_1000_commit_1000

Value Insert Time (s) Insert rows/s (Mdn/P99/P95) StdDev Query Median (ms)
28 36.07 2772/2804/2801 ±0.34 1.34
30 36.18 2764/2795/2792 ±0.24 1.38
0 36.24 2760/2800/2797 ±0.65 1.33

Locking Mode

Tokenizer used: unicode61
Transcation strategy: batch_1000_commit_1000

Value Insert Time (s) Insert rows/s (Mdn/P99/P95) StdDev Query Median (ms)
NORMAL 36.31 2754/2789/2786 ±0.30 1.36
EXCLUSIVE 36.98 2705/2714/2714 ±1.07 1.38

Secure Delete

Tokenizer used: unicode61
Transcation strategy: batch_1000_commit_1000

Value Insert Time (s) Insert rows/s (Mdn/P99/P95) StdDev Query Median (ms)
OFF 33.24 3009/3048/3045 ±0.34 1.32
ON 36.05 2774/2777/2777 ±0.31 1.34

Threads

Tokenizer used: unicode61
Transcation strategy: batch_1000_commit_1000

Value Insert Time (s) Insert rows/s (Mdn/P99/P95) StdDev Query Median (ms)
1 35.95 2782/2789/2788 ±0.15 1.08
2 36.07 2773/2779/2778 ±0.26 1.26
0 36.40 2747/2772/2770 ±0.22 1.36

Fastest Performing Combinations (defined in INSERT_COMBOS)

Tokenizer used: unicode61
Transcation strategy: batch_1000_commit_1000

Config Insert Time (s) Insert rows/s (Mdn/P99/P95) StdDev Checkpoint (s) File Size (KB) Query Median (ms)
{'journal_mode': 'OFF', 'synchronous': 'OFF', 'locking_mode': 'EXCLUSIVE'} 18.92 5285/5286/5286 ±0.00 0.00 56356 0.94
{'journal_mode': 'MEMORY', 'synchronous': 'OFF', 'temp_store': 'MEMORY'} 18.95 5277/5283/5282 ±0.03 0.00 56356 0.93
{'journal_mode': 'MEMORY', 'synchronous': 'OFF', 'temp_store': 'MEMORY', 'locking_mode': 'EXCLUSIVE'} 18.98 5268/5274/5274 ±0.03 0.00 56356 0.92
{'journal_mode': 'WAL', 'synchronous': 'OFF'} 20.23 4944/4948/4948 ±0.20 0.00 56356 0.97
{'journal_mode': 'WAL', 'synchronous': 'NORMAL', 'temp_store': 'MEMORY', 'page_size': 8192, 'cache_size': -65536, 'mmap_size': 1073741824, 'wal_autocheckpoint': 10000} 20.93 4778/4801/4800 ±0.07 0.28 56356 1.09
{'journal_mode': 'WAL', 'synchronous': 'NORMAL', 'wal_autocheckpoint': 10000} 22.09 4528/4574/4571 ±0.19 0.30 56356 1.34
{'journal_mode': 'WAL', 'synchronous': 'NORMAL', 'temp_store': 'MEMORY', 'cache_size': -65536, 'mmap_size': 1073741824} 25.76 3882/3888/3887 ±0.11 0.13 56356 1.09
{'journal_mode': 'WAL', 'synchronous': 'NORMAL'} 26.81 3730/3735/3735 ±0.12 0.16 56356 1.36
{'journal_mode': 'WAL', 'synchronous': 'FULL'} 27.56 3628/3659/3657 ±0.20 0.12 56356 1.36
{'journal_mode': 'WAL', 'synchronous': 'EXTRA'} 28.25 3540/3594/3589 ±0.26 0.12 56356 1.31
{'page_size': 16384, 'cache_size': -65536} 31.29 3195/3212/3210 ±0.40 0.00 54256 0.75
{'page_size': 16384, 'cache_size': -4096} 31.36 3188/3223/3220 ±0.44 0.00 54256 1.58
{'page_size': 16384, 'cache_size': -16384} 31.58 3166/3183/3182 ±0.12 0.00 54256 1.28
{'mmap_size': 0, 'cache_size': -65536} 32.51 3076/3077/3077 ±0.36 0.00 56356 1.01
{'page_size': 65536, 'cache_size': -16384} 32.93 3036/3038/3038 ±0.23 0.00 55296 2.02
{'mmap_size': 28, 'cache_size': -65536} 33.14 3018/3050/3047 ±0.30 0.00 56356 1.16
{'mmap_size': 28, 'cache_size': -8192} 33.15 3017/3064/3060 ±0.59 0.00 56356 1.26
{'page_size': 4096, 'cache_size': -16384} 33.21 3011/3058/3054 ±0.46 0.00 56356 1.15
{'page_size': 65536, 'cache_size': -65536} 33.22 3010/3051/3048 ±0.39 0.00 55296 1.17
{'page_size': 0, 'cache_size': -16384} 33.29 3004/3058/3054 ±0.37 0.00 56356 1.22
{'mmap_size': 0, 'cache_size': -8192} 33.34 3000/3033/3030 ±0.25 0.00 56356 1.19
{'page_size': 0, 'cache_size': -65536} 33.36 2997/3092/3084 ±0.85 0.00 56356 0.84
{'page_size': 4096, 'cache_size': -65536} 33.41 2993/3037/3033 ±0.76 0.00 56356 1.16
{'page_size': 4096, 'cache_size': -4096} 33.51 2984/2996/2995 ±0.25 0.00 56356 1.33
{'page_size': 0, 'cache_size': -4096} 33.62 2974/2979/2979 ±0.47 0.00 56356 1.35
{'page_size': 65536, 'cache_size': -4096} 33.65 2972/2994/2992 ±0.50 0.00 55296 2.21
{'page_size': 65536, 'cache_size': 0} 76.92 1300/1301/1301 ±0.71 0.00 55296 3.89
{'page_size': 16384, 'cache_size': 0} 112.16 892/912/911 ±1.83 0.00 54256 2.10
{'page_size': 4096, 'cache_size': 0} 301.86 331/333/333 ±3.28 0.00 56356 1.80
{'mmap_size': 0, 'cache_size': 0} 302.54 331/331/331 ±0.91 0.00 56356 1.72
{'mmap_size': 28, 'cache_size': 0} 303.38 330/331/331 ±1.44 0.00 56356 1.75
{'page_size': 0, 'cache_size': 0} 304.64 328/330/330 ±1.42 0.00 56356 1.76

Transaction Strategy Impact

PRAGMAs: {'journal_mode': 'OFF', 'synchronous': 'OFF', 'locking_mode': 'EXCLUSIVE'}
Tokenizer used: unicode61
Transcation strategy: batch_10000_commit_10000

Strategy Insert Time (s) Batch Size Insert rows/s (Mdn/P99/P95) StdDev Query Median (ms)
batch_500_commit_5000 18.57 500 5386/5389/5389 ±0.03 0.94
batch_1000_commit_10000 18.58 1000 5382/5383/5383 ±0.00 0.98
batch_10000_commit_10000 18.66 10000 5358/5364/5363 ±0.02 1.02
batch_1000_commit_1000 18.87 1000 5299/5300/5300 ±0.00 0.93