The motivation of this database is to create a database about internet censorship, cyber attacks and social network among the countries for future research usage.
This database enable researchers to answer the following questions: How does the censorship rules spread across countries that share stronger relations? Why some countries are considered greater threats even though they initiate less cyber attacks? How does recent AI technology adoption affect the type of censorship and cyber attacks?
Schema
erDiagram
COUNTRY ||--|{ POLICY: country_id
COUNTRY ||--|{ REQUEST: country_id
COUNTRY ||--|{ REQUEST_DETAIL: country_id
COUNTRY ||--|{ CYBER_ATTACK: country_id
COUNTRY ||--|{ EDGELIST: country_id
COUNTRY {
string country_id PK
float covariates
}
POLICY {
string url PK
string country_id FK
datetime issue_time
string title
string text
category document_type
category issuing_entity
category type_of_law
category type_of_liability
}
REQUEST {
string country_id PK
datetime request_time PK
category reason PK
integer n_requests
}
REQUEST_DETAIL {
string country_id PK
datetime request_time PK
category requestor PK
integer n_requests
integer n_items_requested
integer n_removed_legal
integer n_removed_policy
integer n_no_action_taken
}
CYBER_ATTACK {
string source1 PK
string country_id FK
string victim FK
category type
}
EDGELIST {
string country_id FK
string interaction PK
string interaction_strength
}
Methods and Explanation
COUNTRY
The country df contains all country level covariates. The unit is country and the country id is the primary key
POLICY
The policy df contains all policy level information from Wilmap. The unit is a piece of policy, and their unique urls serve as primary key. Country id is the foreign key here. The df also contains policy wise information such as their text and labels that are already classified.
REQUEST and REQUEST_DETAIL
The request and request detail dfs are the takedown initiations submitted by the governments to the Google platform. The request df’s unit is country-date-reason, using (country_id, request_time, reason) as primary key. The values in it are country-date level frequencies for each reasons. As for the detailed request df, it uses (country_id, request_time, requestor) as primary key, further separating the requests by the countries’ requestor agencies.
CYBER ATTACK
The cyber attack df is merged from 2 different sourced dfs that records the cyber attacks. The unit is incident and the sources’ unique url will serve as a primary key. The initiator and victim countries also uses country id as values, served as foregin keys.
EDGE LIST
The edge list is a countries pairs social network data that records the countries interactions and their relations. The unit will be a dyad of two countries, but with a country column that storage the dyad relations of such country which also conveniently served as a primary key.