Discover Top Posts Tagged with #b-tree

SQLite 的 HC-tree 計畫

Hacker News 首頁上看到的新計畫：「HC-tree is an experimental high-concurrency database back end for SQLite (sqlite.org)」，SQLite 弄了一個實驗性質的 backend，叫做 HC-tree： The HC-tree (hctree) project is an attempt to develop a new database backend that improves upon regular SQLite as follows: 他列了幾個重點，其中「Improved concurrency」這點題到了可以讓多個 writer 同時寫入運作，這點算是 SQLite 很大的改變，目前希望可以做到在 single-threaded 情況下不輸現有的 SQLite： An implicit…

View On WordPress

#b-tree #backend #concurrency #database #db #hc-tree #isolation #mvcc #rdbms #replication #sqlite #writer

PostgreSQL 裡的 B-tree 結構

在「Indexes in PostgreSQL — 4 (Btree)」這邊看到講 PostgreSQL 的 B-tree 結構以及常見的查詢會怎麼使用 B-tree。

裡面講了三種查詢，第一種是等號的查詢 (Search by equality)，第二種是不等號的查詢 (Search by inequality)，第三種是範圍的查詢 (Search by range)。再後面講到排序與 index 的用法。

雖然是分析 PostgreSQL，但裡面是一般性的概念，其他使用 B-tree 結構的資料庫也是類似作法…

View On WordPress

#b-tree #data #database #db #index #postgresql #query #rdbms #search #structure #tree

Many software engineers use database indexes every day, but few of us really understand how they work. In this post I’ll explain: How indexing works in Postgres using B-Trees What B-Trees are Why they are a good fit for this problem Indexes in Postgres Postgres actually offers 4 different kinds of indexes for different use cases. In this post I’ll be focusing on the “normal” index, the kind you get by default when you run create index.

#postgresql #postgres #index #b-tree

Something I recently became interested in is map data structures for external memory — i.e. ways of storing indexed data that are optimized for storage on disk. In a typical analysis of algor…

Fantastic summary of these data structures

#b-tree #brt #lsmt #lsm #fractal tree #data structures

Why is your dbms not using your index?

So recently I was trying to gain some knowledge about indexes using Postgres, going beyond the tutorials I've done and so I attempted to make some queries faster using b-tree indexes on fish count data.

However, before getting into any complex queries I tried out a basic one:

select * from chinook where year > '2000';

When I used explain analyze on this query it gave me:

QUERY PLAN

----------------------------------------------------------------------------------------------

Seq Scan on yearly_chinook_sum_view (cost=0.00..1.15 rows=4 width=40) (actual time=12.559..12.584 rows=12 loops=1)

Filter: ((year)::text > '2000'::text)

Total runtime: 12.669 ms

(3 rows)

So why is it still doing a sequential scan?

First I looked at my table to make sure I really had added a b-tree index on the table named chinook using \d chinook which confirmed that yes, I have a b-tree index on the year column.

Then I started searching for the issue and found these articles:

http://www.depesz.com/2010/09/09/why-is-my-index-not-being-used/

http://www.postgresonline.com/journal/archives/78-Why-is-my-index-not-being-used.html

There were a few possible reasons why my query wasn't using the index, but the one that seemed most likely to me was that it wasn't efficient due to table size.

Next step, how big is the Chinook table?

https://wiki.postgresql.org/wiki/Disk_Usage

So from Wiki I used this join to find my total disk usage including indexes:

SELECT nspname || '.' || relname AS "relation", pg_size_pretty(pg_total_relation_size(C.oid)) AS "total_size" FROM pg_class C LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace) WHERE nspname NOT IN ('pg_catalog', 'information_schema') AND C.relkind <> 'i' AND nspname !~ '^pg_toast' ORDER BY pg_total_relation_size(C.oid) DESC LIMIT 20;

And for Chinook it gave me:

public.chinook | 160 kB

So this is fairly small, but I don't really feel like doing tests to see where the threshold is. What I did test were queries that would return one row versus a larger percentage of rows.

kenai_fishruns=# explain analyze select * from chinook where year = '2009';

QUERY PLAN

----------------------------------------------------------------------------------------------

Index Scan using chinook_year_index on chinook (cost=0.28..9.67 rows=48 width=44) (actual time=30.929..31.151 rows=48 loops=1)

Index Cond: ((year)::text = '2009'::text)

Total runtime: 31.244 ms

(3 rows)

VERSUS:

kenai_fishruns=# explain analyze select * from chinook where year > '2009';

QUERY PLAN

----------------------------------------------------------------------------------------------

Seq Scan on chinook (cost=0.00..13.45 rows=203 width=44) (actual time=0.033..1.258 rows=203 loops=1)

Filter: ((year)::text > '2009'::text)

Rows Removed by Filter: 393

Total runtime: 1.350 ms

(4 rows)

So this started pointing me in the right direction, I figured it had to do with the amount of rows returned versus some error in how I indexed my tables, so I googled and stackoverflow came to the rescue again:

http://stackoverflow.com/questions/5203755/why-does-postgresql-perform-sequential-scan-on-indexed-column

Basically the answer to this post says that if a query returns approximately more than 5-10% (roughly) of the total rows, then it will use a sequential scan versus an index scan.

This makes sense with my little test between the queries returning the one row versus the many rows it would be returning for the second query. Unfortunately this stackoverflow post didn't cite where to find this info in the docs, but that's research for another day. At least now I can rest easy knowing the general why behind Postgres choosing to use a sequential scan versus an index scan.

#postgresql #sqlqueries #database #b-tree

MySQL Resultset Order Changes?

I have been approached today with a rather interesting question: when selecting from MySQL without explicit sorting the result set is delivered in some "order" which has been accustomed to; all of a sudden the order changed dramatically, what could have been the cause of that?

Well the answer is actually very simple: InnoDB stores data on disk along the lines of a clustered index, which in turn is based around a B-Tree. Well, when inserting into the tree without explicit use of an auto increment field as the primary key, the whole tree might get rebuilt based on the key value. This, in turn, may heavily impact the ordering of the result set.

#mysql #indexing #innodb #b-tree

Why is your dbms not using your index?

However, before getting into any complex queries I tried out a basic one:

select * from chinook where year > '2000';

When I used explain analyze on this query it gave me:

QUERY PLAN

----------------------------------------------------------------------------------------------

Seq Scan on yearly_chinook_sum_view (cost=0.00..1.15 rows=4 width=40) (actual time=12.559..12.584 rows=12 loops=1)

Filter: ((year)::text > '2000'::text)

Total runtime: 12.669 ms

(3 rows)

So why is it still doing a sequential scan?

First I looked at my table to make sure I really had added a b-tree index on the table named chinook using \d chinook which confirmed that yes, I have a b-tree index on the year column.

Then I started searching for the issue and found these articles:

http://www.depesz.com/2010/09/09/why-is-my-index-not-being-used/

http://www.postgresonline.com/journal/archives/78-Why-is-my-index-not-being-used.html

There were a few possible reasons why my query wasn't using the index, but the one that seemed most likely to me was that it wasn't efficient due to table size.

Next step, how big is the Chinook table?

https://wiki.postgresql.org/wiki/Disk_Usage

So from Wiki I used this join to find my total disk usage including indexes:

And for Chinook it gave me:

public.chinook | 160 kB

So this is fairly small, but I don't really feel like doing tests to see where the threshold is. What I did test were queries that would return one row versus a larger percentage of rows.

kenai_fishruns=# explain analyze select * from chinook where year = '2009';

QUERY PLAN

----------------------------------------------------------------------------------------------

Index Scan using chinook_year_index on chinook (cost=0.28..9.67 rows=48 width=44) (actual time=30.929..31.151 rows=48 loops=1)

Index Cond: ((year)::text = '2009'::text)

Total runtime: 31.244 ms

(3 rows)

VERSUS:

kenai_fishruns=# explain analyze select * from chinook where year > '2009';

QUERY PLAN

----------------------------------------------------------------------------------------------

Seq Scan on chinook (cost=0.00..13.45 rows=203 width=44) (actual time=0.033..1.258 rows=203 loops=1)

Filter: ((year)::text > '2009'::text)

Rows Removed by Filter: 393

Total runtime: 1.350 ms

(4 rows)

http://stackoverflow.com/questions/5203755/why-does-postgresql-perform-sequential-scan-on-indexed-column

Basically the answer to this post says that if a query returns approximately more than 5-10% (roughly) of the total rows, then it will use a sequential scan versus an index scan.

#postgresql #sqlqueries #database #b-tree

#b-tree

Trending Tags

Recently Viewed Tags

#b-tree