@INPROCEEDINGS{Hossein-NFV-SDN2021, AUTHOR="Hossein Ahmadvand and Tooska Dargahi and Fouzhan Foroutan and Princewill Okorie and Flavio Esposito", TITLE="Big Data Processing at the Edge with Data Skew Aware Resource Allocation", BOOKTITLE="2021 IEEE Conference on Network Function Virtualization and Software Defined Networks (NFV-SDN) (NFV-SDN'21)", ADDRESS=virtual, DAYS=9, MONTH=nov, YEAR=2021, KEYWORDS="Data Skew; Big Data Processing; Resource Allocation; Cloud Computing; Edge Computing", ABSTRACT="With the increasing number of connected devices and the generation of a large amount of data, efficient methods are required to deal with the complexities of big data processing, especially within edge computing infrastructures. Processing batch data on the cloud is often too costly. However, edge servers can be used as alternative resources. Various factors influence Cloud-Edge resource provisioning a challenging issue, one of which, data skewness is the focus of this paper. Some data blocks require more processing resources. For example, when processing a log file, the number of repetitions of a target URL varies in different parts of the file which impacts the required processing resources. Existing resource allocation methods ignore the data skew feature in big data processing and apply the same policies to different parts of the input data. In this paper, we propose a data skew aware approach for allocating each data block to the edge or the cloud server for processing. We reduce the cost of data processing on the cloud by categorizing the data blocks based on their significance (i.e., their impact on the processing output). Then, we assign the blocks that are less significant and require less processing power to the Edge and the rest to the cloud. Our experimental results confirm that our provisioning approach improves the processing cost by up to 35\% compared to the state-of-the-art." }