SQLite FTS5 performance on 2012 HDD-only server
I'm developing an app that uses the FTS5 module in SQLite for search functionality and it just so happens that I want to know in advance what I can expect from it. Unfortunately, there are hardly any articles on the web that explicitly test FTS5 INSERT/SELECT queries speed — besides LLM-generated articles full of nonsense, but with good SEO — so this blog entry is all about that.
Hardware
Dedicated server straight from 2012, running on Debian 13.1:
- CPU: Intel Xeon E3-1220 V2
-
RAM: 32GiB DDR3 1600 MHz
- 4 x Micron 18KSF1G72AZ-1G6P1 (8 GiB)
-
Storage: 7.3TB, RAID 0 (306 iops)
- 2 x Toshiba MC04ACA400E (3.6T)
Why use HDD-only system from 2012 in 2025? That's a good question, moving on. 1
Database
I wanted to test FTS5 engine with relatively small amount of data, so I took hackernews dataset (all posts from HN til 2024) end inserted it into one table and the numbers are:
- 28GB in size (unicode61, porter)
- 74GB in size (trigram)
- 41'400'000 rows, 4 columns (by, text, title, url)
What's the Unicode61, Porter and Trigram?
Those are tokenizers that are available by default with SQLite3 FTS5 engine.
Porter itself is a wrapper around Unicode61 and also allows you search terms like "correction" to match similar words such as "corrected" or "correcting". And Trigram extends FTS5 to support substring matching in general, instead of the usual token matching, which is good for fuzzy search (and hence triple the size of the database).
It worth saying that default tokenizers are optimized mostly for English – good luck with searching anything that requires word segmentation support. If you want to support CJK, then use something like better-trigram. Testing this is out of scope of this blog entry.
SELECT queries performance with Unicode61 tokenizer
Let's start with realistic scenario and then we will bench it for real.
I want to search for 'test' everywhere: in the user's nickname, the text, the URL and the title. Also, I don't want to limit myself with pity LIMIT statements. Let's see:
SELECT * FROM hn('test');
Just shy of 385345 results — a pretty good amount to display to a user on a single page. How long did that take?
real 5m51.913s
Come to think of it maybe displaying 385345 results is suboptimal. Let's add LIMIT after all:
SELECT * FROM hn('test') LIMIT 50;
real 0m0.027s
user 0m0.019s
sys 0m0.008s
Much better! Let's also try a little bit more complicated queries
-- This equals to "Airbnb AND ruins AND everything"
SELECT * FROM hn('Airbnb ruins everything');
4 results
real 0m0.040s
user 0m0.024s
sys 0m0.016s
-- phrase "crypto scam" must appear at start of a column
SELECT * FROM hn('^crypto + scam')
13 results
real 0m0.114s
user 0m0.028s
sys 0m0.008s
...you know what? That's pretty good for my case. But to find more results, we can utilize Porter and Trigram.
SELECT comparison between Unicode61, Porter and Trigram
Let's do same queries as before and also record amount of results for each tokenizer. Those queries are being run once without prior runs, so no caching is involved.
Tokenizer | Results amount | Query duration (s) |
---|---|---|
Unicode61 | 97774 results | 212.9179s |
Porter | 154188 results | 244.6411s |
Trigram | 161511 results | 333.1113s |
Tokenizer | Results amount | Query duration (s) |
---|---|---|
Unicode61 | 33067 results | 112.4054s |
Porter | 477944 results | 422.8689s |
Trigram | 46222 results | 203.8271s |
Tokenizer | Results amount | Query duration (s) |
---|---|---|
Unicode61 | 0 results | 0.3662s |
Porter | 0 results | 0.4749s |
Trigram | 0 results | 31.0733s |
Tokenizer | Results amount | Query duration (s) |
---|---|---|
Unicode61 | 11 results | 0.6669s |
Porter | 11 results | 0.8576s |
Trigram | 10 results | 15.5765s |
Tokenizer | Results amount | Query duration (s) |
---|---|---|
Unicode61 | 1 result | 15.0231s |
Porter | 1 result | 16.0596s |
Trigram | 4 results | 100.0801s |
Tokenizer | Results amount | Query duration (s) |
---|---|---|
Unicode61 | 985 results | 5.6444s |
Porter | 846 results | 5.8575s |
Trigram | 160 results | 74.0494s |
We can clearly see that Trigram struggles, but sometimes returns the most results. What if we make some queries that utilize unique feature about Trigram – that it can be queried by a sequence of characters, not only by complete tokens?
Tokenizer | Results amount | Query duration (s) |
---|---|---|
Unicode61 | 781 results | 470.1948s |
Porter | 781 results | 381.8981s |
Trigram | 781 results | 3.2189s |
Tokenizer | Results amount | Query duration (s) |
---|---|---|
Unicode61 | 9248 results | 562.7608s |
Porter | 9248 results | 538.4104s |
Trigram | 9248 results | 38.7541s |
So while Trigram is 3 times the size of Unicode61 database and quite a bit slower in cases when matching rows by tokens it is much faster in LIKE and GLOB queries...
Tokenizer | Results amount | Query duration (s) |
---|---|---|
Unicode61 | 4855683 results | 584.2672s |
Porter | 4855683 results | 582.6868s |
Trigram | 4855683 results | 1410.1309s |
But not with regex – that's linear scan for every party involved.
Benchmarkin'
Created using little python script that tries various PRAGMA combinations to check changes in INSERT speed as well as SELECT performance.
There's also quite a few variables:
- The performance measurements are specific to the Python sqlite3 standard library, which might not be representative of the underlying SQLite capabilities. And I also didn't bother with testing in other languages and different libraries.
- SELECTs are run once per benchmark, which means SQLite3 can't cache the response. Every benchmark creates a new SQLite3 database.
Run settings:
Runs: 3
Dataset limit: 100000
We run every benchmark 3 times, 100K rows in the database (around 5MB for unicode61, 15MB for trigram).
Query patterns:
QUERY_PATTERNS = {
"simple_and": ["easy simple", "database performance optimization", "for some reason really long and"],
"phrase_search": ['"machine learning"', 'ai + shit', '"data science"'],
"prefix_search": ["python*", "data*", "web*"],
"initial_token": ["^python", "^data", "^web"],
"near_operator": ["NEAR(rust sqlite, 5)", "NEAR(react htmx, 3)", "NEAR(ml shit, 2)"],
"column_filter": ["title: python", "text: database", "by: dawg"], "prefix_and_initial_token": ["^python*"],
"complex": ["NEAR(rust python, 5) performance*", "^performance NEAR(rust python, 3)"]
}
Those are used to calculate SELECT timing median.
Sorting
Tables are sorted by Insert Time (s) – this is how much time was required to insert the whole dataset.
Maximum Performance Combinations
FTS5 table config: {'prefix': '2 3'}
FTS5 runtime config: {'pgsz': 60000, 'automerge': 16, 'crisismerge': 8}
Tokenizer used: porter
Transcation strategy: batch_500_commit_5000
Config | Insert Time (s) | Insert rows/s (Mdn/P99/P95) | StdDev | Query Median (ms) | Safety |
---|---|---|---|---|---|
{'mmap_size': 28, 'page_size': 16384, 'synchronous': 'OFF', 'journal_mode': 'OFF', 'temp_store': 'FILE', 'threads': 1, 'cache_size': -8192, 'secure_delete': 'OFF', 'locking_mode': 'NORMAL'} | 19.63 | 5094/5105/5104 | ±0.05 | 1.21 | UNSAFE |
{'mmap_size': 28, 'page_size': 16384, 'synchronous': 'NORMAL', 'journal_mode': 'WAL', 'temp_store': 'FILE', 'threads': 1, 'cache_size': -8192, 'secure_delete': 'OFF', 'locking_mode': 'NORMAL'} | 22.14 | 4517/4525/4525 | ±0.06 | 1.55 | SAFE |
{'synchronous': 'EXTRA', 'cell_size_check': 'ON', 'foreign_keys': 'ON', 'mmap_size': 0, 'journal_mode': 'WAL', 'page_size': 16384, 'temp_store': 'FILE', 'cache_size': -8192, 'threads': 1} | 24.61 | 4063/4070/4070 | ±0.08 | 1.31 | EXTRA SAFE |
FTS5 Tokenizer Performance
Tokenizer used: unicode61
Transcation strategy: batch_1000_commit_1000
Tokenizer | Insert Time (s) | Insert rows/s (Mdn/P99/P95) | StdDev | File Size (KB) |
---|---|---|---|---|
porter | 35.57 | 2811/2834/2832 | ±0.44 | 55664 |
unicode61 remove_diacritics 2 | 35.70 | 2801/2818/2816 | ±0.31 | 56356 |
porter ascii | 35.96 | 2781/2838/2833 | ±0.68 | 55664 |
unicode61 remove_diacritics 1 | 36.27 | 2757/2862/2853 | ±0.79 | 56356 |
ascii | 36.45 | 2743/2762/2760 | ±0.34 | 56352 |
unicode61 | 37.01 | 2702/2707/2707 | ±0.41 | 56356 |
unicode61 remove_diacritics 0 | 37.47 | 2669/2719/2715 | ±0.52 | 56356 |
trigram | 58.97 | 1696/1741/1737 | ±1.26 | 155904 |
Individual PRAGMA Performance
Synchronous
Tokenizer used: unicode61
Transcation strategy: batch_1000_commit_1000
Value | Insert Time (s) | Insert rows/s (Mdn/P99/P95) | StdDev | Query Median (ms) |
---|---|---|---|---|
OFF | 20.35 | 4915/4958/4954 | ±0.11 | 1.03 |
NORMAL | 35.39 | 2826/2855/2853 | ±0.32 | 1.35 |
FULL | 35.44 | 2822/2886/2881 | ±0.67 | 1.33 |
EXTRA | 37.20 | 2688/2688/2688 | ±0.28 | 1.31 |
Temp Store
Tokenizer used: unicode61
Transcation strategy: batch_1000_commit_1000
Value | Insert Time (s) | Insert rows/s (Mdn/P99/P95) | StdDev | Query Median (ms) |
---|---|---|---|---|
FILE | 35.32 | 2831/2832/2832 | ±0.29 | 1.03 |
MEMORY | 35.71 | 2801/2804/2804 | ±0.03 | 1.08 |
Cache Size
Tokenizer used: unicode61
Transcation strategy: batch_1000_commit_1000
Value | Insert Time (s) | Insert rows/s (Mdn/P99/P95) | StdDev | Query Median (ms) |
---|---|---|---|---|
-8192 | 33.36 | 2997/3010/3009 | ±0.46 | 1.26 |
-524288 | 33.91 | 2949/3039/3032 | ±0.94 | 1.15 |
-65536 | 34.12 | 2931/2966/2963 | ±0.49 | 1.15 |
-16384 | 34.27 | 2918/2938/2936 | ±0.69 | 1.00 |
-2000 | 35.79 | 2794/2807/2806 | ±0.25 | 1.06 |
Journal Mode
Tokenizer used: unicode61
Transcation strategy: batch_1000_commit_1000
Value | Insert Time (s) | Insert rows/s (Mdn/P99/P95) | StdDev | Query Median (ms) |
---|---|---|---|---|
WAL | 27.54 | 3631/3636/3636 | ±0.47 | 1.38 |
OFF | 28.28 | 3536/3570/3567 | ±0.29 | 1.27 |
MEMORY | 28.39 | 3522/3538/3537 | ±0.19 | 0.97 |
DELETE | 36.03 | 2776/2817/2814 | ±0.44 | 1.31 |
TRUNCATE | 37.68 | 2654/2666/2665 | ±0.24 | 1.09 |
PERSIST | 39.81 | 2512/2523/2522 | ±0.20 | 1.34 |
Page Size
Tokenizer used: unicode61
Transcation strategy: batch_1000_commit_1000
Value | Insert Time (s) | Insert rows/s (Mdn/P99/P95) | StdDev | Query Median (ms) |
---|---|---|---|---|
16384 | 31.46 | 3179/3199/3198 | ±0.73 | 1.57 |
32768 | 32.12 | 3113/3120/3120 | ±0.17 | 1.44 |
8192 | 33.94 | 2946/2982/2979 | ±0.26 | 1.21 |
4096 | 35.66 | 2804/2822/2820 | ±0.22 | 1.07 |
1024 | 43.03 | 2324/2325/2325 | ±0.36 | 1.42 |
Mmap Size
Tokenizer used: unicode61
Transcation strategy: batch_1000_commit_1000
Value | Insert Time (s) | Insert rows/s (Mdn/P99/P95) | StdDev | Query Median (ms) |
---|---|---|---|---|
28 | 36.07 | 2772/2804/2801 | ±0.34 | 1.34 |
30 | 36.18 | 2764/2795/2792 | ±0.24 | 1.38 |
0 | 36.24 | 2760/2800/2797 | ±0.65 | 1.33 |
Locking Mode
Tokenizer used: unicode61
Transcation strategy: batch_1000_commit_1000
Value | Insert Time (s) | Insert rows/s (Mdn/P99/P95) | StdDev | Query Median (ms) |
---|---|---|---|---|
NORMAL | 36.31 | 2754/2789/2786 | ±0.30 | 1.36 |
EXCLUSIVE | 36.98 | 2705/2714/2714 | ±1.07 | 1.38 |
Secure Delete
Tokenizer used: unicode61
Transcation strategy: batch_1000_commit_1000
Value | Insert Time (s) | Insert rows/s (Mdn/P99/P95) | StdDev | Query Median (ms) |
---|---|---|---|---|
OFF | 33.24 | 3009/3048/3045 | ±0.34 | 1.32 |
ON | 36.05 | 2774/2777/2777 | ±0.31 | 1.34 |
Threads
Tokenizer used: unicode61
Transcation strategy: batch_1000_commit_1000
Value | Insert Time (s) | Insert rows/s (Mdn/P99/P95) | StdDev | Query Median (ms) |
---|---|---|---|---|
1 | 35.95 | 2782/2789/2788 | ±0.15 | 1.08 |
2 | 36.07 | 2773/2779/2778 | ±0.26 | 1.26 |
0 | 36.40 | 2747/2772/2770 | ±0.22 | 1.36 |
Fastest Performing Combinations (defined in INSERT_COMBOS)
Tokenizer used: unicode61
Transcation strategy: batch_1000_commit_1000
Config | Insert Time (s) | Insert rows/s (Mdn/P99/P95) | StdDev | Checkpoint (s) | File Size (KB) | Query Median (ms) |
---|---|---|---|---|---|---|
{'journal_mode': 'OFF', 'synchronous': 'OFF', 'locking_mode': 'EXCLUSIVE'} | 18.92 | 5285/5286/5286 | ±0.00 | 0.00 | 56356 | 0.94 |
{'journal_mode': 'MEMORY', 'synchronous': 'OFF', 'temp_store': 'MEMORY'} | 18.95 | 5277/5283/5282 | ±0.03 | 0.00 | 56356 | 0.93 |
{'journal_mode': 'MEMORY', 'synchronous': 'OFF', 'temp_store': 'MEMORY', 'locking_mode': 'EXCLUSIVE'} | 18.98 | 5268/5274/5274 | ±0.03 | 0.00 | 56356 | 0.92 |
{'journal_mode': 'WAL', 'synchronous': 'OFF'} | 20.23 | 4944/4948/4948 | ±0.20 | 0.00 | 56356 | 0.97 |
{'journal_mode': 'WAL', 'synchronous': 'NORMAL', 'temp_store': 'MEMORY', 'page_size': 8192, 'cache_size': -65536, 'mmap_size': 1073741824, 'wal_autocheckpoint': 10000} | 20.93 | 4778/4801/4800 | ±0.07 | 0.28 | 56356 | 1.09 |
{'journal_mode': 'WAL', 'synchronous': 'NORMAL', 'wal_autocheckpoint': 10000} | 22.09 | 4528/4574/4571 | ±0.19 | 0.30 | 56356 | 1.34 |
{'journal_mode': 'WAL', 'synchronous': 'NORMAL', 'temp_store': 'MEMORY', 'cache_size': -65536, 'mmap_size': 1073741824} | 25.76 | 3882/3888/3887 | ±0.11 | 0.13 | 56356 | 1.09 |
{'journal_mode': 'WAL', 'synchronous': 'NORMAL'} | 26.81 | 3730/3735/3735 | ±0.12 | 0.16 | 56356 | 1.36 |
{'journal_mode': 'WAL', 'synchronous': 'FULL'} | 27.56 | 3628/3659/3657 | ±0.20 | 0.12 | 56356 | 1.36 |
{'journal_mode': 'WAL', 'synchronous': 'EXTRA'} | 28.25 | 3540/3594/3589 | ±0.26 | 0.12 | 56356 | 1.31 |
{'page_size': 16384, 'cache_size': -65536} | 31.29 | 3195/3212/3210 | ±0.40 | 0.00 | 54256 | 0.75 |
{'page_size': 16384, 'cache_size': -4096} | 31.36 | 3188/3223/3220 | ±0.44 | 0.00 | 54256 | 1.58 |
{'page_size': 16384, 'cache_size': -16384} | 31.58 | 3166/3183/3182 | ±0.12 | 0.00 | 54256 | 1.28 |
{'mmap_size': 0, 'cache_size': -65536} | 32.51 | 3076/3077/3077 | ±0.36 | 0.00 | 56356 | 1.01 |
{'page_size': 65536, 'cache_size': -16384} | 32.93 | 3036/3038/3038 | ±0.23 | 0.00 | 55296 | 2.02 |
{'mmap_size': 28, 'cache_size': -65536} | 33.14 | 3018/3050/3047 | ±0.30 | 0.00 | 56356 | 1.16 |
{'mmap_size': 28, 'cache_size': -8192} | 33.15 | 3017/3064/3060 | ±0.59 | 0.00 | 56356 | 1.26 |
{'page_size': 4096, 'cache_size': -16384} | 33.21 | 3011/3058/3054 | ±0.46 | 0.00 | 56356 | 1.15 |
{'page_size': 65536, 'cache_size': -65536} | 33.22 | 3010/3051/3048 | ±0.39 | 0.00 | 55296 | 1.17 |
{'page_size': 0, 'cache_size': -16384} | 33.29 | 3004/3058/3054 | ±0.37 | 0.00 | 56356 | 1.22 |
{'mmap_size': 0, 'cache_size': -8192} | 33.34 | 3000/3033/3030 | ±0.25 | 0.00 | 56356 | 1.19 |
{'page_size': 0, 'cache_size': -65536} | 33.36 | 2997/3092/3084 | ±0.85 | 0.00 | 56356 | 0.84 |
{'page_size': 4096, 'cache_size': -65536} | 33.41 | 2993/3037/3033 | ±0.76 | 0.00 | 56356 | 1.16 |
{'page_size': 4096, 'cache_size': -4096} | 33.51 | 2984/2996/2995 | ±0.25 | 0.00 | 56356 | 1.33 |
{'page_size': 0, 'cache_size': -4096} | 33.62 | 2974/2979/2979 | ±0.47 | 0.00 | 56356 | 1.35 |
{'page_size': 65536, 'cache_size': -4096} | 33.65 | 2972/2994/2992 | ±0.50 | 0.00 | 55296 | 2.21 |
{'page_size': 65536, 'cache_size': 0} | 76.92 | 1300/1301/1301 | ±0.71 | 0.00 | 55296 | 3.89 |
{'page_size': 16384, 'cache_size': 0} | 112.16 | 892/912/911 | ±1.83 | 0.00 | 54256 | 2.10 |
{'page_size': 4096, 'cache_size': 0} | 301.86 | 331/333/333 | ±3.28 | 0.00 | 56356 | 1.80 |
{'mmap_size': 0, 'cache_size': 0} | 302.54 | 331/331/331 | ±0.91 | 0.00 | 56356 | 1.72 |
{'mmap_size': 28, 'cache_size': 0} | 303.38 | 330/331/331 | ±1.44 | 0.00 | 56356 | 1.75 |
{'page_size': 0, 'cache_size': 0} | 304.64 | 328/330/330 | ±1.42 | 0.00 | 56356 | 1.76 |
Transaction Strategy Impact
PRAGMAs: {'journal_mode': 'OFF', 'synchronous': 'OFF', 'locking_mode': 'EXCLUSIVE'}
Tokenizer used: unicode61
Transcation strategy: batch_10000_commit_10000
Strategy | Insert Time (s) | Batch Size | Insert rows/s (Mdn/P99/P95) | StdDev | Query Median (ms) |
---|---|---|---|---|---|
batch_500_commit_5000 | 18.57 | 500 | 5386/5389/5389 | ±0.03 | 0.94 |
batch_1000_commit_10000 | 18.58 | 1000 | 5382/5383/5383 | ±0.00 | 0.98 |
batch_10000_commit_10000 | 18.66 | 10000 | 5358/5364/5363 | ±0.02 | 1.02 |
batch_1000_commit_1000 | 18.87 | 1000 | 5299/5300/5300 | ±0.00 | 0.93 |