PostgreSQL: Unique key for a distributed database. Practice

next article "Unique key in a distributed database".

We have a database which we want to share. Ideally, I want to do master-master. One of the most difficult moments, is to ensure the uniqueness of keys on all servers. And well if the database was originally designed for scaling... Again, this is something from the realm of the ideal, which is, let's just say not often.

So we have a base which is necessary to prepare for synchronization of master-master will make all keys in our database are unique within a project.

The article has considered several options, but we will focus on one proposed Instagram

Step 1 — Translate all keys in bigint

The implication here is that all our primary keys are named id and fields respectively which refer to these keys named like this: order_id, client_id, table_id,

Create a function which translates the integer bigint
the

DROP FUNCTION IF EXISTS "field_int2big" (field text, tablename text, table_schema text);
CREATE OR REPLACE FUNCTION "field_int2big" (field text, tablename text, table_schema text) RETURNS bigint AS 
$body$ 
DECLARE 
BEGIN 
EXECUTE 'ALTER TABLE '|| table_schema || '."'|| tablename || '" ALTER COLUMN "'|| field || '" TYPE bigint;' ;
return 1;
END; 
$body$ LANGUAGE 'plpgsql';

Then select all the integer the fields and their convertiran:
the

select * field_int2big(column_name, table_name, table_schema) from 
(select table_catalog, table_schema, table_name, column_name, data_type
from information_schema.columns where table_schema in ('public', 'myscheme') 
and data_type in ('integer', 'oid') and (position('id' in column_name)>0 OR column_name in ('key', 'myfield'))
order by table_catalog, table_schema, table_name, column_name 
limit 10 offset 0) c

A few things you should pay attention:

You can/must add your schema: table_schema in ('public', 'myscheme')
can Also add your field not named "standard": column_name in ('key', 'myfield')
note the limit 10 for large database tables, you need to reduce it down to 1 — change of type takes time and no small
the Query should be run several times, each time he will find the remaining non-translated fields

Step 2 — Transfer of functional indexes, which is a direct indication of the type

In General, if you do not — will bring problems in the future, they are extremely difficult to detect: produces an error which is not visible in the executable query.

the

DROP FUNCTION IF EXISTS "index_int2big" (idx text, declare_idx text);
CREATE OR REPLACE FUNCTION "index_int2big" (idx text, declare_idx text) RETURNS text AS 
$body$ 
DECLARE 
new_idx text;
BEGIN 
EXECUTE 'DROP INDEX IF EXISTS' || idx;
SELECT replace(declare_idx, 'integer', 'bigint') INTO new_idx;
EXECUTE new_idx ;
return new_idx;
END; 
$body$ LANGUAGE 'plpgsql';

select * index_int2big(indname, inddef) from 
(SELECT n.nspname as table_schema, c.relname as table_name, c2.relname AS indname, i.indisprimary, 
i.indisunique, i.indisclustered, pg_catalog.pg_get_indexdef(i.indexrelid, 0, true) AS inddef
FROM pg_catalog.pg_class c, pg_catalog.pg_class c2, pg_catalog.pg_index i, pg_namespace n
WHERE n.oid=c.relnamespace and c.oid = i.indrelid AND i.indexrelid = c2.oid
and n.nspname in ('bucardo', 'public')
and position('integer' in pg_catalog.pg_get_indexdef(i.indexrelid, 0, true))>0
limit 10 offset 0) c

You may notice that I check in the schema bucardo. If you are already running a synchronization on the basis of this technology, then this step becomes extremely important. Also see the notes from the last step.

Step 3 — Create new sequences for all key fields

In the proposed embodiment, Instagram is used a unique number for each schema/server.
i.e. it is necessary to have in each diagram its function with a unique ID.
I slightly changed the function and generate a unique key for the server IP.

the

CREATE OR REPLACE FUNCTION inet2num(inet) RETURNS numeric AS $$ 
DECLARE 
a text[] := string_to_array(host($1), '.'); 
BEGIN 
RETURN a[1]::numeric * 16777216 + 
a[2]::numeric * 65536 + 
a[3]::numeric * 256 + 
a[4]::numeric; 
END; 
$$ LANGUAGE plpgsql IMMUTABLE STRICT; 


DROP FUNCTION IF EXISTS next_id(tbl text, tableschema text);
CREATE OR REPLACE FUNCTION next_id(tbl text, tableschema text = 'public') returns bigint AS $$
DECLARE
our_epoch bigint := 1314220021721;
seq_id bigint;
now_millis bigint;
shard_id bigint;
result bigint;
BEGIN
SELECT nextval(tableschema||'."' || tbl || '_id_seq"') % 1024 INTO seq_id;
/* select substring(regexp_replace(md5(current_database()||inet_server_addr()||version()), '[^\\\d]+', ", 'g')::text from 1 for 6)::int into shard_id;*/
SELECT inet2num(inet_server_addr()) into shard_id;

result := (now_millis - our_epoch) << 23;
result := result | (shard_id < < 10);
result := result | (seq_id);
RETURN result;
END;
$$ LANGUAGE PLPGSQL;

Now we have to call her on all of our IDs:
the

DROP FUNCTION IF EXISTS "reset_nextid" (tablename text, tableschema text);
CREATE OR REPLACE FUNCTION "reset_nextid" (tablename text, tableschema text) RETURNS bigint AS 
$body$ 
DECLARE 
id_type text;
BEGIN 
SELECT data_type from information_schema.columns c where "table_schema"=tableschema and 'table_name' =tablename and column_name='id' INTO id_type;
IF id_type <> 'bigint' THEN
EXECUTE 'ALTER TABLE '|| tableschema || '."'|| tablename || '" ALTER COLUMN id TYPE bigint;' ;
END IF;

EXECUTE 'ALTER TABLE '|| tableschema || '."'|| tablename || '" ALTER COLUMN id SET DEFAULT next_id("'|| tablename || "', "'|| tableschema || "');';
return next_id(tablename, tableschema);
END; 
$body$ LANGUAGE 'plpgsql';

select t.*, reset_nextid(table_name, table_schema) from (
select t.table_schema, t.table_name, sequence_name, c.column_name from information_schema.sequences s
left join information_schema.tables t on split_part(sequence_name, '_id_seq',1)=table_name
left join information_schema.columns c on (t.table_schema=c.table_schema and t.table_name=c.table_name and c.column_name='id')
where c.column_name is not null and position('next_id' in c.column_default)<>1 and s.sequence_schema=t.table_schema 
and t.table_schema in ('public', 'acc') 
order by t.table_schema, t.table_name limit 50 offset 0
) as t

Since we already changed the ID type to bigint, the query must work quickly.

This training was done, our database is running and ready to work in parallel.

Bonus.

If something went wrong with next_id, you can go back to the standard sequence:
the

CREATE OR REPLACE FUNCTION "restore_nextval" (tablename text, tableschema text = 'public') RETURNS bigint AS 
$body$ 
DECLARE 
BEGIN 
EXECUTE 'ALTER TABLE '|| tableschema || '."'|| tablename || '" ALTER COLUMN id SET DEFAULT nextval("'|| tablename || '_id_seq"::regclass);';
return nextval((tableschema || '."'|| tablename || '_id_seq"')::regclass);
END; 
$body$ LANGUAGE 'plpgsql';

select t.*, restore_nextval(table_name, table_schema) from (
select t.table_schema, t.table_name, sequence_name, c.column_name from information_schema.sequences s
left join information_schema.tables t on split_part(sequence_name, '_id_seq',1)=table_name
left join information_schema.columns c on (t.table_schema=c.table_schema and t.table_name=c.table_name and c.column_name='id')
where c.column_name is not null and position('next_id' in c.column_default)>0 and s.sequence_schema=t.table_schema 
and t.table_schema in ('public', 'acc')

Thank you. I hope someone was helpful.

P. S. don't try to do it on a 32 bit server. First update the server. And application server too.

Article based on information from habrahabr.ru

Поиск по этому блогу

computer express