Analyzing Hot and Cold Tables with MaxCompute

Solution Design

The detailed process is explained as follows:

  • For hot data, obtain the details of the input_tables field in the tasks_history table, and then use count to count the number of times each table is used in a certain time window.
  • For cold data, associate and sort by using the total number of jobs in the input_tables in tables and tasks_history. This allows you to count the number of times each table is used within the specified time window and list the tables in ascending order.

Procedure

select 
inst_id ,
input_tables,
output_tables,
start_time,
end_time
from information_schema.tasks_history
where ds='20190902'limit 100;
select
--Remove the brackets ([]) at the beginning and end of the input_tables field
trans_array(1,",",inst_id,replace(replace(input_tables,"[",""),"]","")) as (inst_id,input_table)
from information_schema.tasks_history
--The date can be defined based on actual requirements, in which case 20190902 is used as an example
where ds='20190902' limit 100;
select 
--Calculate statistics by table name
input_table
,count(distinct inst_id) table_read_num
from
(
select
--Remove the brackets ([]) at the beginning and end of the input_tables field
trans_array(1,",",inst_id,replace(replace(input_tables,"[",""),"]","")) as (inst_id,input_table)
from information_schema.tasks_history
where ds='20190902'
) t
group by input_table
order by table_read_num desc
limit 1000;
select
t1.table_schema,
t1.table_name,
--Join two tables
if(t2.table_read_num is null,0,table_read_num) as table_read_num
FROM information_schema.tables t1
left join(
select
--Remove the quotation marks ("") before and after the table name
regexp_replace(t.input_table,""","") as input_table
,count(distinct t.inst_id) table_read_num
from
(
select
--Remove the brackets ([]) at the beginning and end of the input_tables field
trans_array(1,",",inst_id,replace(replace(input_tables,"[",""),"]","")) as (inst_id,input_table)
from information_schema.tasks_history
where ds='20190902' )t
group by input_table
)t2
--Match by using join conditions
on concat('your_project_name.',t1.table_name)=t2.input_table
order by table_read_num desc
limit 1000;

Original Source:

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Alibaba Cloud

Alibaba Cloud

Follow me to keep abreast with the latest technology news, industry insights, and developer trends. Alibaba Cloud website:https://www.alibabacloud.com