Clickhouse insert datetime python

How to insert a DF into a clickhouse table, if data type is uuid and list of datetime?

Question: I have a column in a table with data type . So you need to follow these steps: add the new column with type Array(int) update array-column value important : depends on the count of rows in the table this operation can take much time to execute.

How to insert a DF into a clickhouse table, if data type is uuid and list of datetime?

I have a df as a result of a sql query, rows are as follows:

UserID: 6f6e526c-acd8-496f-88e0-6bfdd594e2c2 ObjectID: 6095016 ObjectClass: cottages Views: [datetime.datetime(1970, 1, 1, 0, 0)] RecDate: 2021-07-13 15:50:32 Events: [''] 
CREATE TABLE default.rec_eval_data ( `UserID` UUID, `ObjectID` Int32, `ObjectClass` String, `Views` Array(DateTime), `RecDate` DateTime, `Events` Array(String) ) ENGINE = ReplacingMergeTree ORDER BY (UserID, ObjectID) SETTINGS index_granularity = 8192 

I’m trying different ways to insert the DF into the table, but i keep getting various errors.

I’m using clickhouse_driver library to perform stuff.

I have read This thread and used all the methods, but to no avail

What I have tried already:

Error: clickhouse_driver.errors.TypeMismatchError: Code: 53. Type mismatch in VALUES section. Repeat query with types_check=True for detailed info. Column Views: argument out of range

  1. Pandahouse: connection = dict(database=’default’, host=’http:// localhost:8123 ‘, user=’default’, schema=’default’, password=») ph.to_clickhouse(data, ‘rec_eval_data’, index=False, chunksize=100000, connection=connection)
Читайте также:  Break питон как работает

Error: It uses http and GET method, which automatically acts in readonly mode, so I could not proceed. Maybe there’s a way to change method to POST?

  1. clickhouse_driver insert_dataframe: client.insert_dataframe(‘INSERT INTO rec_eval_data VALUES ‘, data)

Error: TypeError: Unsupported column type: . list or tuple is expected.

  1. iteration: for date, row in data.T.iteritems(): client.execute(«INSERT INTO rec_eval_data » «(UserID, »
    «ObjectID, »
    «ObjectClass, »
    «Views, »
    «RecDate, »
    «Events)»
    » VALUES » «(, » «, » «, » «, » «, » «) » .format( UserID=UserID, ObjectID=row[‘ObjectID’], ObjectClass=row[‘ObjectClass’], Views=row[‘UserID’], RecDate=row[‘RecDate’], Events=row[‘Events’]))

Error: It tries to split UserID into pieces. Can’t find how to avoid it: DB::Exception: Missing columns: ‘6bfdd594e2c2’ ‘496f’ ‘acd8’ ‘6f6e526c’ while processing query: ‘((( 6f6e526c — acd8) — 496f ) — 88.) — 6bfdd594e2c2 ‘, required columns: ‘6f6e526c’ ‘acd8’ ‘496f’ ‘6bfdd594e2c2’ ‘6f6e526c’ ‘acd8’ ‘496f’ ‘6bfdd594e2c2’.

Please, help, I can’t fix it( I’m new both to CH and pandas(

! this code works only when for column Events Array(String) passed the empty array otherwise got the error «AttributeError: ‘ list’ object has no attribute ‘tolist ‘» (it looks like it is the bug in clickhouse_driver ).

from datetime import datetime from uuid import UUID from clickhouse_driver import Client import pandas as pd client = Client(host='localhost', settings=) def get_inserted_data(): return [ < 'UserID': UUID('417ddc5d-e556-4d27-95dd-a34d84e40003'), 'ObjectID': 1003, 'ObjectClass': 'Class3', 'Views': [datetime.now(), datetime.now()], 'RecDate': datetime.now(), #'Events': ['aa', 'bb'] # got error "AttributeError: 'list' object has no attribute 'tolist'" 'Events': [] >] data = [] for item in get_inserted_data(): data.append([ item['UserID'], item['ObjectID'], item['ObjectClass'], item['Views'], item['RecDate'], item['Events'] ]) client.insert_dataframe( 'INSERT INTO test.rec_eval_data VALUES', pd.DataFrame(data, columns=['UserID', 'ObjectID', 'ObjectClass', 'Views', 'RecDate', 'Events']) ) 
from clickhouse_driver import Client from iso8601 import iso8601 client = Client(host='localhost') client.execute( 'INSERT INTO test.rec_eval_data (UserID, ObjectID, ObjectClass, Views, RecDate, Events) VALUES', [< 'UserID': '417ddc5d-e556-4d27-95dd-a34d84e40002', 'ObjectID': 1002, 'ObjectClass': 'Class2', 'Views': [iso8601.parse_date('2021-08-02 01:00:00'), iso8601.parse_date('2021-08-03 01:00:00')], 'RecDate': iso8601.parse_date('2021-08-02 01:00:00'), 'Events': ['03', '04'] >]) 
import requests CH_USER = 'default' CH_PASSWORD = '' SSL_VERIFY = False host = 'http://localhost:8123' db = 'test' table = 'rec_eval_data' content = 'UserID\tObjectID\tObjectClass\tViews\tRecDate\tEvents' \ '\n417ddc5d-e556-4d27-95dd-a34d84e46a50\t1001\tClass1\t[\'2021-08-01 00:00:00\',\'2021-08-02 00:00:00\']\t2021-08-01 00:00:00\t[\'01\',\'02\']' content = content.encode('utf-8') query_dict = < 'query': 'INSERT INTO ' + db + '.' + table + ' FORMAT TabSeparatedWithNames ' >r = requests.post(host, data=content, params=query_dict, auth=(CH_USER, CH_PASSWORD), verify=SSL_VERIFY) print(r.text) 

Database — Clickhouse Data Import, Int8 type has range -128..127. 2010 (first value) is out of range of Int8. If you change table definition, everything is Ok: $ clickhouse-client ClickHouse …

How to change the column’s type from numeric to array in Clickhouse

I have a column in a table with data type Int32 . Is it possible to convert column into array data type Array(Int32) . If not what are the other ways, kindly let me know.

The changing type of a column from int to Array(int) cannot be performed by ALTER TABLE .. MODIFY COLUMN-query because such typecasting is not allowed.

So you need to follow these steps:

ALTER TABLE test.test_004 ADD COLUMN `value_array` Array(int); /* Test table preparation: CREATE TABLE test.test_004 ( `id` int, `value` int ) ENGINE = MergeTree(); INSERT INTO test.test_004 VALUES (1, 10), (2, 20), (3, 30), (4, 40); */ 
ALTER TABLE test.test_004 UPDATE value_array = [value] WHERE 1 /* Result ┌─id─┬─value─┬─value_array─┐ │ 1 │ 10 │ [10] │ │ 2 │ 20 │ [20] │ │ 3 │ 30 │ [30] │ │ 4 │ 40 │ [40] │ └────┴───────┴─────────────┘ */ 
  1. important : depends on the count of rows in the table this operation can take much time to execute. To check the status of update (mutation) or find the reason of failing observe the system.mutations -table
SELECT * FROM system.mutations WHERE table = 'test_004' 
ALTER TABLE test.test_004 DROP COLUMN value 

Remark : assign the extra ON CLUSTER-clause for each query if the table is located on several servers.

Cannot start clickhouse-client due to DB::Exception, Connecting to localhost:9000 as user default. Connected to ClickHouse server version 21.12.3 revision 54452. 🙂 Cannot load data …

Data type for null and string with \ in Clickhouse

Above is the Json I am getting from Kafka. I am able to CREATE TABLE using most of the keys just want to know what data type should I provide for KEYS_INSTANCE_ID to create table in Clickhouse using MergerTree and Kafka engine using Materialized view. I tried string but didn’t worked for me for creating the table.

#to create table using mergetree engine:

CREATE TABLE IF NOT EXISTS readings_hb_trial_11 ( KEYS_INSTANCE_ID String ) ENGINE = MergeTree ORDER BY KEYS_INSTANCE_ID 

#to create table using kafka engine:

CREATE TABLE IF NOT EXISTS readings_queue_hb_trial_11 ( KEYS_INSTANCE_ID String ) ENGINE = Kafka SETTINGS kafka_broker_list = '10########2', kafka_topic_list = 'R########B', kafka_group_name = 'readings_consumer_group3', kafka_format = 'JSONEachRow', kafka_max_block_size = 1048576 
CREATE MATERIALIZED VIEW readings_queue_mv_hb_trial_11 TO readings_hb_trial_11 AS SELECT KEYS_INSTANCE_ID FROM readings_queue_hb_trial_11 

I suspect you made mistake in JSON when braced an array to double quotes so the correct one should look as «KEYS_INSTANCE_ID»:[«i1»] .

In this case, the type Array(String) should help.

/* Emulate the table with Kafka-engine */ CREATE TABLE readings_queue_hb_trial_11 ( `KEYS_INSTANCE_ID` Array(String) ) ENGINE = Memory 
/* MV takes just the first item from an array (as I understood it is your case). */ CREATE MATERIALIZED VIEW readings_queue_mv_hb_trial_11 TO readings_hb_trial_11 AS SELECT empty(KEYS_INSTANCE_ID) ? '' : KEYS_INSTANCE_ID[1] AS KEYS_INSTANCE_ID FROM readings_queue_hb_trial_11 

Emulate processing some messages:

INSERT INTO readings_queue_hb_trial_11 SELECT JSONExtractArrayRaw('', 'KEYS_INSTANCE_ID') UNION ALL SELECT JSONExtractArrayRaw('<"KEYS_INSTANCE_ID":[]>', 'KEYS_INSTANCE_ID') UNION ALL SELECT JSONExtractArrayRaw('<"KEYS_INSTANCE_ID":["i1", "i2"]>', 'KEYS_INSTANCE_ID') 
SELECT * FROM readings_hb_trial_11 ┌─KEYS_INSTANCE_ID─┐ │ "i1" │ │ │ │ "i1" │ └──────────────────┘ 

ClickHouse: How to store JSON data the right way?, Although ClickHouse uses the fast JSON libraries (such as simdjson and rapidjson) to parsing I think the Nesting-fields should be faster. If …

Источник

Оцените статью