PostgreSQL: Unique key for a distributed database. Practice
next article "Unique key in a distributed database".
We have a database which we want to share. Ideally, I want to do master-master. One of the most difficult moments, is to ensure the uniqueness of keys on all servers. And well if the database was originally designed for scaling... Again, this is something from the realm of the ideal, which is, let's just say not often.
So we have a base which is necessary to prepare for synchronization of master-master will make all keys in our database are unique within a project.
The article has considered several options, but we will focus on one proposed Instagram
The implication here is that all our primary keys are named id and fields respectively which refer to these keys named like this: order_id, client_id, table_id,
Create a function which translates the integer bigint
the
Then select all the integer the fields and their convertiran:
the
A few things you should pay attention:
In General, if you do not — will bring problems in the future, they are extremely difficult to detect: produces an error which is not visible in the executable query.
the
You may notice that I check in the schema bucardo. If you are already running a synchronization on the basis of this technology, then this step becomes extremely important. Also see the notes from the last step.
In the proposed embodiment, Instagram is used a unique number for each schema/server.
i.e. it is necessary to have in each diagram its function with a unique ID.
I slightly changed the function and generate a unique key for the server IP.
the
Now we have to call her on all of our IDs:
the
Since we already changed the ID type to bigint, the query must work quickly.
This training was done, our database is running and ready to work in parallel.
If something went wrong with next_id, you can go back to the standard sequence:
the
Thank you. I hope someone was helpful.
P. S. don't try to do it on a 32 bit server. First update the server. And application server too.
Article based on information from habrahabr.ru
We have a database which we want to share. Ideally, I want to do master-master. One of the most difficult moments, is to ensure the uniqueness of keys on all servers. And well if the database was originally designed for scaling... Again, this is something from the realm of the ideal, which is, let's just say not often.
So we have a base which is necessary to prepare for synchronization of master-master will make all keys in our database are unique within a project.
The article has considered several options, but we will focus on one proposed Instagram
Step 1 — Translate all keys in bigint
The implication here is that all our primary keys are named id and fields respectively which refer to these keys named like this: order_id, client_id, table_id,
Create a function which translates the integer bigint
the
DROP FUNCTION IF EXISTS "field_int2big" (field text, tablename text, table_schema text);
CREATE OR REPLACE FUNCTION "field_int2big" (field text, tablename text, table_schema text) RETURNS bigint AS
$body$
DECLARE
BEGIN
EXECUTE 'ALTER TABLE '|| table_schema || '."'|| tablename || '" ALTER COLUMN "'|| field || '" TYPE bigint;' ;
return 1;
END;
$body$ LANGUAGE 'plpgsql';
Then select all the integer the fields and their convertiran:
the
select * field_int2big(column_name, table_name, table_schema) from
(select table_catalog, table_schema, table_name, column_name, data_type
from information_schema.columns where table_schema in ('public', 'myscheme')
and data_type in ('integer', 'oid') and (position('id' in column_name)>0 OR column_name in ('key', 'myfield'))
order by table_catalog, table_schema, table_name, column_name
limit 10 offset 0) c
A few things you should pay attention:
-
the
- You can/must add your schema: table_schema in ('public', 'myscheme') the
- can Also add your field not named "standard": column_name in ('key', 'myfield') the
- note the limit 10 for large database tables, you need to reduce it down to 1 — change of type takes time and no small the
- the Query should be run several times, each time he will find the remaining non-translated fields
Step 2 — Transfer of functional indexes, which is a direct indication of the type
In General, if you do not — will bring problems in the future, they are extremely difficult to detect: produces an error which is not visible in the executable query.
the
DROP FUNCTION IF EXISTS "index_int2big" (idx text, declare_idx text);
CREATE OR REPLACE FUNCTION "index_int2big" (idx text, declare_idx text) RETURNS text AS
$body$
DECLARE
new_idx text;
BEGIN
EXECUTE 'DROP INDEX IF EXISTS' || idx;
SELECT replace(declare_idx, 'integer', 'bigint') INTO new_idx;
EXECUTE new_idx ;
return new_idx;
END;
$body$ LANGUAGE 'plpgsql';
select * index_int2big(indname, inddef) from
(SELECT n.nspname as table_schema, c.relname as table_name, c2.relname AS indname, i.indisprimary,
i.indisunique, i.indisclustered, pg_catalog.pg_get_indexdef(i.indexrelid, 0, true) AS inddef
FROM pg_catalog.pg_class c, pg_catalog.pg_class c2, pg_catalog.pg_index i, pg_namespace n
WHERE n.oid=c.relnamespace and c.oid = i.indrelid AND i.indexrelid = c2.oid
and n.nspname in ('bucardo', 'public')
and position('integer' in pg_catalog.pg_get_indexdef(i.indexrelid, 0, true))>0
limit 10 offset 0) c
You may notice that I check in the schema bucardo. If you are already running a synchronization on the basis of this technology, then this step becomes extremely important. Also see the notes from the last step.
Step 3 — Create new sequences for all key fields
In the proposed embodiment, Instagram is used a unique number for each schema/server.
i.e. it is necessary to have in each diagram its function with a unique ID.
I slightly changed the function and generate a unique key for the server IP.
the
CREATE OR REPLACE FUNCTION inet2num(inet) RETURNS numeric AS $$
DECLARE
a text[] := string_to_array(host($1), '.');
BEGIN
RETURN a[1]::numeric * 16777216 +
a[2]::numeric * 65536 +
a[3]::numeric * 256 +
a[4]::numeric;
END;
$$ LANGUAGE plpgsql IMMUTABLE STRICT;
DROP FUNCTION IF EXISTS next_id(tbl text, tableschema text);
CREATE OR REPLACE FUNCTION next_id(tbl text, tableschema text = 'public') returns bigint AS $$
DECLARE
our_epoch bigint := 1314220021721;
seq_id bigint;
now_millis bigint;
shard_id bigint;
result bigint;
BEGIN
SELECT nextval(tableschema||'."' || tbl || '_id_seq"') % 1024 INTO seq_id;
/* select substring(regexp_replace(md5(current_database()||inet_server_addr()||version()), '[^\\\d]+', ", 'g')::text from 1 for 6)::int into shard_id;*/
SELECT inet2num(inet_server_addr()) into shard_id;
result := (now_millis - our_epoch) << 23;
result := result | (shard_id < < 10);
result := result | (seq_id);
RETURN result;
END;
$$ LANGUAGE PLPGSQL;
Now we have to call her on all of our IDs:
the
DROP FUNCTION IF EXISTS "reset_nextid" (tablename text, tableschema text);
CREATE OR REPLACE FUNCTION "reset_nextid" (tablename text, tableschema text) RETURNS bigint AS
$body$
DECLARE
id_type text;
BEGIN
SELECT data_type from information_schema.columns c where "table_schema"=tableschema and 'table_name' =tablename and column_name='id' INTO id_type;
IF id_type <> 'bigint' THEN
EXECUTE 'ALTER TABLE '|| tableschema || '."'|| tablename || '" ALTER COLUMN id TYPE bigint;' ;
END IF;
EXECUTE 'ALTER TABLE '|| tableschema || '."'|| tablename || '" ALTER COLUMN id SET DEFAULT next_id("'|| tablename || "', "'|| tableschema || "');';
return next_id(tablename, tableschema);
END;
$body$ LANGUAGE 'plpgsql';
select t.*, reset_nextid(table_name, table_schema) from (
select t.table_schema, t.table_name, sequence_name, c.column_name from information_schema.sequences s
left join information_schema.tables t on split_part(sequence_name, '_id_seq',1)=table_name
left join information_schema.columns c on (t.table_schema=c.table_schema and t.table_name=c.table_name and c.column_name='id')
where c.column_name is not null and position('next_id' in c.column_default)<>1 and s.sequence_schema=t.table_schema
and t.table_schema in ('public', 'acc')
order by t.table_schema, t.table_name limit 50 offset 0
) as t
Since we already changed the ID type to bigint, the query must work quickly.
This training was done, our database is running and ready to work in parallel.
Bonus.
If something went wrong with next_id, you can go back to the standard sequence:
the
CREATE OR REPLACE FUNCTION "restore_nextval" (tablename text, tableschema text = 'public') RETURNS bigint AS
$body$
DECLARE
BEGIN
EXECUTE 'ALTER TABLE '|| tableschema || '."'|| tablename || '" ALTER COLUMN id SET DEFAULT nextval("'|| tablename || '_id_seq"::regclass);';
return nextval((tableschema || '."'|| tablename || '_id_seq"')::regclass);
END;
$body$ LANGUAGE 'plpgsql';
select t.*, restore_nextval(table_name, table_schema) from (
select t.table_schema, t.table_name, sequence_name, c.column_name from information_schema.sequences s
left join information_schema.tables t on split_part(sequence_name, '_id_seq',1)=table_name
left join information_schema.columns c on (t.table_schema=c.table_schema and t.table_name=c.table_name and c.column_name='id')
where c.column_name is not null and position('next_id' in c.column_default)>0 and s.sequence_schema=t.table_schema
and t.table_schema in ('public', 'acc')
Thank you. I hope someone was helpful.
P. S. don't try to do it on a 32 bit server. First update the server. And application server too.
Комментарии
Отправить комментарий