Big Data Analytic Techniques and Issues

May 26, 2017 | Autor: Thilini Ranasinghe | Categoria: Big Data Analytics

Descrição do Produto

Big Data Analytic Techniques and Issues T.H.T.M.Ranasinghe Faculty of Information Technology University of Moratuwa

[email protected]

Abstract – Big data analytics is one of the latest topics in research areas because nowadays people are entering to the big data era. By looking at big data, we can say it’s all about massive amounts of data. Organizations are generating large amounts of data in the range of zettabytes and yottabytes with high volume, high velocity and high variety. Organizations are facing difficulties when processing this massive data with traditional databases and analyzing using existing traditional software tools and techniques. This paper briefly describe about the attributes of big data and the existing analytic techniques with big data. This paper mainly focus on open source frameworks for analyzing big data since it can be used by any organization. Further it discusses about the issues and challenges faced in big data analytics. Keywords – Big Data, Big Data Attributes, Big Data Analytics, Hadoop, R, Cloud Computing

I.

INTRODUCTION

Human beings have entered in to the big data era with the development of new technologies in the twenty first century and with the wide spread of the internet. Simply put, big data means data that exceeds the processing capacity of conventional database systems. The data is too big, moves too fast, or doesn’t fit the structures of typical database architectures. These data sets are so complex that it becomes difficult to process using conventional database management tools. For example web logs, call records, medical records, military surveillance logs, photography archives, video archives and large scale e-commerce data can be considered to be big data. Nowadays organizations are having trouble managing and manipulating these data by using traditional database management systems and software tools because they are unable to process, handle and analyze the massive amount of data which were generated by the organization.

To gain value from this data, we must choose an alternative way to process it since the size of data is beyond the ability of typical database software tools to capture, store, manage, and analyze. The process of examining the big data to uncover hidden patterns, unknown correlations and other useful information that can be used for decision making is called big data analytics. In big data analytics, we use advanced analytical techniques against large and diverse data sets that include different types of data such as structured, unstructured and different sized data which follows high volume, high velocity or high variety. Because of the exponential growth of data there are numerous issues on big data analytics. The section 2 of this review paper gives an overview on big data and big data analytics covering important key aspects. Then in section 3 it describes the practical applications of big data and section 4 describes the techniques and tools used for analyzing big data. Section 5 describes big data analytics with cloud computing and section 6 talks about different areas in big data analytics and finally in section 7 it describes issues in big data and also challenges in big data and big data analytic techniques. II.

BIG DATA AND BIG DATA ANALYTICS

Big data is a collection of datasets with massive amount of data in the range of zettabytes and yottabytes [1]. Big Data can be apprehended from various sources like healthcare data, retail sector data, sensors data, posts to social media sites, digital pictures and videos, purchase transaction records and cell phone GPS signals and etc. The size of Big Data is constantly doubling every 40 months and in early 2013 it ranges around few dozen zettabytes to yottabytes in a single data set [1]. We can classify collected data according to their nature as unstructured data, semi structured data and structured data.

information technology and the increasing intensity of online transactions [4].

DATA

Structured Semi- Structured Unstructured

Figure 1. Classification of Big Data as Their Nature

Many applications in various domains create different types of data and many of them are unstructured or semi-structured. Transactional data which are generated due to the massive amount of transactions/operations processed by large scale systems are structured with the predefined schemas. For example web logs and business transactions can be considered in this category. Scientific data which are collected from data-intensive experiments or applications are ranging from structured data to semi structured data. Unstructured data, contain metadata such as tags, usernames which are important to understand data. User generated data from applications with massive number of users are typically unstructured since they are user contributed data [4]. A.

Attributes of Big Data

The definition of “Big Data” is further described in terms of its attributes as 3 V’s: Volume, Variety and Velocity. Although most research papers focus on only these main three attributes, there are several other attributes and as such this paper mentions many of these additional attributes. IBM finalized two more V’s as Value and Veracity, thus making 5 V’s of Big Data. Further in, one more V was proposed as Variability to make 6 V’s of big data .These 6 V’s are now listed as: Volume, Variety, Velocity, Value, Veracity and Variability [1]. Volume refers to the size and the number of data sets generated by different sources in real time. As an example web applications such as Facebook and Google are dealing with massive a number of customers, the data sets consumed by such web applications are extraordinary large. Such big-volume issues can also be found in the areas of finance, communication and business informatics, due to the wide application of

Variety refers that, data comes from different sources and they differ in types. It includes unstructured data like blogs, emails, data from social networking sites, audio and video messages, semi-structured data like XML files to structured data such as logs, data from databases, data warehouses to name a few. Velocity relates to the speed of generating data and the processing time of incoming data into the system and also the frequency of delivery. Velocity also refers to time sensitivity of processing such as catching frauds [12]. After analyzing, the stored data must be perceived to have a value. The data should be accurate and should have a value in order to predict trends or it will be worthless storing that data. Veracity means the quality and accuracy (reliable sources) of the data, trustworthiness of data collection and processing methods. This ensures that data is protected from unauthorized access and modifications through its lifecycle. This attribute is a very important one because most of the time these data are used to make decisions in organizations. Therefore the data source should be trustworthy and when the number of sources grows it is more challengeable for the enterprise. Variability refers to data whose meaning is constantly changing. Furthermore the data that are different in terms of quality and speed [1] and are generated from different types of devices. Other than these attributes a group of researchers have proposed another new attribute of big data as viability. Most of the data are multidimensional by its nature. For the output, all the dimensions may not be relevant during the processing. This process can be inefficient in terms of space and time when considering all the dimensions. Therefore it requires selecting most relevant dimensions and factors which matters to predict outcomes. This is the meaning of viability attribute [1].

Volume Viability

Variety

7 V's Varibility

Velocity

Veracity

Value

Figure 2. 7 V’s of Big Data

B.

Big Data Analytics

Big data analytics is the use of advanced analytical techniques over a large amount of data in order to discover hidden patterns, unknown correlations and other useful information which can be used for decision making purposes. A report by the McKinsey Global Institute (Manyika et al. 2011) predicted that by 2018, the United States alone will face a shortage of 140,000 to 190,000 people with deep analytical skills, as well as a shortfall of 1.5 million data-savvy managers with the know-how to analyze big data to make effective decisions. The opportunities associated with the big data analytics in different organizations, have generated due to significant increase of data. Most companies often use relational database management systems (RDBMS) for collected data and the most common analytical techniques are statistical methods and data mining techniques. These techniques are adopted for association analysis, data segmentation and clustering, classification and regression analysis, anomaly detection, and predictive modeling in various business applications [2]. Since the volume of data is increasing day by day it is difficult to process using traditional databases and software systems. Therefore to analyze the data we have to use specialized software tools and applications or integrate such tools with the typical software. For example, IBM integrated R and Hadoop. Most of data processing BI platforms are offered by major vendors including Microsoft, IBM and SAP.

There are four main layers in big data. Those are, Data source layer, Data storage layer, Data processing/analyzing layer and Data output layer. In data source layer there are structured data in RDBMS, NoSQL, Hbase, or Impala; unstructured data in Hadoop MapReduce; streaming data from the web, social media, sensors and operational systems and limited capabilities for performing descriptive analytics. Tools such as Hive, HBase, Storm and Spark also sit on this layer. Even though Hadoop is in data layer now it is capable of analyzing the big data too. Most of the time data are stored in different places and in order to make decisions we need to collect data stored in different warehouses. Apart from handling, reading and writing data from multiple disks, combining data being read from multiple devices is a major confront faced by the techniques for processing and analyzing Big Data [1]. To analyze these data, we need parallel and distributed data management tools. Most of research has focus on developing systems for update intensive workloads as well as ad-hoc analysis workloads. Initial designs include distributed databases [also see J. B. Rothnie Jr.] for update intensive workloads, and parallel database systems [also see D. J. Dewitt] for analytical workloads [14]. Through TDWI Big Data Analytics survey, benefits of big data identified are: better aimed marketing, more straight business insights, client based segmentation, recognition of sales and market chances, automated decision making, definitions of customer behaviors, greater return on investments, quantification of risks and market trending, comprehension of business alteration, better planning and forecasting, identification consumer behavior from click streams and production yield extension [11]. III. A.

APPLICATIONS OF BIG DATA ANALYTICS

E-Commerce

Big data are generated from web and e – commerce communities. There are leading e-commerce vendors such as Amazon, eBay incorporating innovative and highly scalable ecommerce platforms and product recommendation systems. Other than this, other applications are social media monitoring and analysis, Crowd-sourcing systems, Social and virtual games and etc. These are producing large amounts of data. Business enterprises collect vast amounts of multi-modal data,

including customer transactions, inventory management, store-based video feeds, advertising and customer relations, customer preferences and sentiments, sales management infrastructure, and financial data, among others [15]. B.

E-Government

Nowadays politicians mostly use multimedia web platforms for successful policy discussions, campaign advertising, voter mobilization, event announcements, and online donations. Some other applications which causes big data are Ubiquitous government services, Equal access and public services, Citizen Engagement and participation Political campaign and e-polling. C.

Health sector

Health sector generates numerous patient care points of contact, sophisticated medical instruments, and web-based health communities. Human and plant Genomics, Healthcare decision support, Patient community analysis are the applications that generate the data. While it is difficult to estimate current size and growth rates, by some estimates, the global size of clinical data stands at roughly 150 Exabyte in 2011, increasing at a rate between 1.2 and 2.4 Exabyte per year3 [15]. In addition to clinical data, healthcare data also includes pharmaceutical data (drug molecules and structures, drug targets, other bimolecular data, highthroughput screening data (microarrays, mass spec, sequencers), and clinical trials), data on personal practices and preferences (including dietary habits, exercise patterns, environmental factors), and financial/ activity records [20]. D.

throughout the information cycle, exploration and analysis of the rapid and intuitive information of any combination of structured and unstructured, BI Tools which provide comprehensive capabilities for business intelligence and performance management, including enterprise reporting, dashboards, ad-hoc analysis, scorecards, and what-if scenario analysis on an integrated, enterprise scale platform[9], In-Database Analytics that apply techniques directly in databases for finding patterns and relationships among the data, . Organizations have various types of data and they use various options for big data analytics. Therefore it’s difficult to select the best one since there are many options in big data analytics. Thus in this paper, we mainly focus on open source frameworks that helps in processing of big data.

Science and Technology

Because of the improvement of the technology the data generated are rapidly increasing. Most of them are the outputs of sensors and astrophysics and oceanography, to genomics and environmental research.

IV.

TECHNIQUES AND TOOLS

Various techniques, algorithms and tools for big data have been developed by researchers such as artificial intelligence systems, algorithms, and database communities. There are some key approaches to analyzing data, such as discovery tools that are used

Hadoop is one open source framework/platform for distributing computing problems across a number of servers hosted by Apache Software Foundation. It makes easy to process big data and supports storing and processing large data sets across multiple clustered systems. Hadoop enables businesses to unlock potential value from new data using inexpensive commodity servers. Organizations primarily use Hadoop as a precursor to advanced forms of analytics [9]. A.

HDFS

HDFS (Hadoop Distributed File System) enable the storing of large files by distributing the data among a pool of data nodes. This storage system aims to store large datasets in distributed clusters and provide high performance access to the data in Hadoop clusters. HDFS is organized as clusters and each cluster consists of several nodes. A file is divided into two blocks and they are stored in different nodes. HDFS has two different forms called Namenode and Datanode. These two are different from the computation and functionality. In Namenode, it’s only storing the metadata for directions and files. The data are stored in Datanode. HDFS manages the namespace tree, file mapping and directory blocks to the Datanode. When client request to access to the data in cluster, first it sends the request to Namenode which has the Datanode that the client wants to reside in a cluster. TCP based protocol is used for all the communications and connections among the nodes in cluster.

C.

Master Node Job Tracker

Name Node

HOST1

HOST2

HOST3

HOST N

Task Tracker

Task Tracker

Task Tracker

Task Tracker

Data Node

Data Node

Data Node

Data Node

Worker Nodes Figure 3. Hadoop File Distributed System

B.

Map Reduce

Map reduce is a programming model which combined the implementation of processing and generating large data sets with parallel cluster. This is a programming paradigm for processing large amount of data among several servers in Hadoop clusters. Map reduce is a framework for processing problems which can be parallelized across massive datasets using clusters i.e. a large number of computers, or a grid [1]. This contains mappers and reducers which carries two functionalities. There is master node to control and to coordinate other worker nodes. Map is the step that retrieves data from the distributed file system like HDFS and distribute data for processing. Mapper receives input as pair and then process the input to generate output same as pair. That obtained output should be written on the local disk, not on HDFS. After the mapping step is finished, the reduce step begins. The output of the map task will be the input to the reducer.

Figure 4. Data Flow for Map Reduce

Pig and Hive

Pig is a high level programming language that simplifies the common tasks of working with Hadoop: loading data, expressing transformations on the data, and storing the final results [3]. It is a data processing system to analyze large data sets [11]. Hive enables Hadoop to operate as a data warehouse [3] and it acts like a data warehousing application that provides a SQL like interface and relational model [11]. As with Pig, Hive’s core capabilities are extensible. Groups of researchers have examined the problem of sorting and analyzing its performance on Hadoop cluster by varying the number of nodes in a cluster and then comparing these clusters in terms of processing time required to complete a given job [1]. They have found that processing time of the given job decreases when the number of nodes increase within a cluster. The experimental results are as follows. TABLE 1: EXPERIMENTAL RESULTS

No. nodes cluster(N)

in

the

Time taken for processing(minutes)

N=1

16:20

N=2 N=3

11:17 8:29

N=4 N=5

4:35 3:48

the

Revolution R is another analyzing technique which is open source with deep integration for statistical analysis. R has the ability to push computation to the data and to do the analysis in parallel. R is a one of the powerful statistics programming languages. Analytics can write R codes to manipulate data without the need for sql statements. R programs can be run in the database and there is no need to move data. And also R has the ability to push the computation to the data and to do the analysis in parallel.

Services, for testing and developing systems and to analyze existing data. This IaaS providers provide data storage and data backup with low cost and reliable environments [12] [also see A. Abouzeid].Software as a Service providers has embedded analytics engines to help analyze the data that are stored on cloud platforms. The output after the analytics can be provided through graphical interface [12] [also see R.Baraglia]. There are some key factors of cloud computing which benefits the big data analytics. Those factors as follows [12]. 

Figure 5. Hybrid model that combines all the techniques

V.

BIG DATA ANALYTIC WITH CLOUD COMPUTING

With the continuous increase of the details and the volume of captured data by enterprises, for an example the rise of the social media, internet of things (IoT) and multimedia, has generated an overwhelming flow of data either in structured, semi-structured or unstructured format. It is the fact that huge data is too large for processing and also difficult to move that data. But this is possible with cloud computing as most of public data such as twitter, Facebook, financial markets data, weather data, genome datasets and aggregated industry specific data live in the cloud and it has become more effective and efficient for the organization to analyze the data in the cloud itself [13]. Cloud services are capable of ingesting, storing and analyzing data. These services has been around for some time and they enable organizations to overcome the challenges associated with big data [12]. There are lots of tools and platforms which provides cloud infrastructure for analysis big data such as Map Reduce [12]. Cloud computing is a successful paradigm of service oriented computing, it consists of three major paradigms as Infrastructure as a Service (IaaS), Platform as a Service (PaaS), and Software as a Service (SaaS) [14]. Earlier, big data on cloud computing platforms were deployed as Hadoop clusters on the flexible and scalable environments provided by different Infrastructure as a Service providers such as Rackspace and Amazon Web

Reduced cost – Cloud computing is based on the concept of pay-as-you-go. The cost reduction is the clearest benefit. The organization needs to spend only as per the increments on required capacity and do not need to invest for maximum capacity.   Improved automation – Cloud computing is based on the services that can be provisioned and de-provisioned in an automatic fashion.  Focus on core competency – Enterprises can gain the benefit of cloud computing to focus on its core operation and core objectives and leverage IT resources as a means to provide services.  There are many solutions are available for big data analytics related to cloud computing because there are a wide range of analytic requirements. Cloud computing is playing a major role for big data analytics, not only because it provides infrastructure and tools, but also because it acts as a business model that big data analytics can follow, as examples Analytics as a Service, Big Data as a Service [13]. VI. A.

AREAS OF BIG DATA ANALYTICS

Data Management And Supporting Architectures

To perform better analytics on large volumes of data it requires expert ways to retrieve, transform, filter and store the data. We need data management and supporting architecture to utilize specialized resources in efficient manner. Data experts should also consider about storage and retrieve the data for analytics, data velocity and integration, data variety and data processing tasks. Data storage were proposed to retrieve and store large amount of data demanded by Big Data. Internet – scale file systems such as the Google File System (GFS) are attempts to accomplish the scalability, reliability and

robustness the internet services that need for data storage [13]. Data integration solutions have become a major challenge in big data. Many papers have proposed several algorithms to automatically incorporate user feedbacks and update existing data in data integration programs [4]. From big data analytics and data processing, organizations are free from unnecessary data transfer and replication and the use of disparate data processing and analytical solutions. Map reduce is most popular programming model to process massive amounts of data on clusters of computers which is developed for data processing and resource management. Hadoop is the most used open source map reduce implementation [13]. B.

Model Development And Scoring

The data storage and Data as a Service (DaaS) capabilities are important services provided by the cloud, but for analytics we can build models by using data to utilize for forecasts and prescription. Since the models are developed based on available data, they need to be tested with new set of data to appraise its ability in forecasting future behavior. Zementis is one technology that provide data analysis and model building which can run either on a customer’s premises or on allocated SaaS using infrastructure as a Service (IaaS) provided by the solutions such as Amazon. Google Prediction API allows users to build machine learning models to predict numeric values for a new item based on values of previously submitted training data or predict category that best describe an item [13]. C. Visualization And User Interaction Effective visualization tools are decisive because of the increasing amounts of data which needed to manage after analysis. These tools have to concern about the quality of data and presentation to facilitate navigation [13] [see also J. Davey]. Analysts are selecting the type of the visualization needed based on the amount of data to be presented or according to improving both performance and displaying. Visualization can accommodate in three main types of big data analytics. They are descriptive analytic, predictive analytic and prescriptive analytic. Some visualization tools cannot present advanced aspects of big data analytics. Most important aspect that should be considered about visualization and user interaction is that network is still a bottleneck in several scenarios.

Addition to the visualization or raw data, summarized contents in reports are important to present in predictive analytics and prescriptive analytics. SAP Crystal Solutions [13] provides BI functionalities via which customers can explore available data to build reports with interactive charts, what-if scenarios, and dashboards. D. Business Models Besides providing tools and techniques, users can build their own big data analytic solutions such as models for delivering analytical capabilities as a service on a cloud. When talking about the current state of developing customized solutions as user’s need on their premises and elaborate some of the challenges to enable analytics. Some of the potential enterprises have proposed their business models and they include following features in it. 





Hosting customer analytics jobs in a shared platform – This is appropriate for enterprises which have multiple analytics. A full stack designed to provide customers with end to end solutions – This is suitable for companies which do not have people who have expert knowledge on analysis. This model provides publish domain – specific analytical stream templates by analytical service providers. Expose analytics models as hosted services – Here analytic capabilities are hosted on the cloud and provided to users as services. This model is ideal for companies which do not have enough data for good prediction. VII.

ISSUES AND CHALLENGES

Big data handles large amount of data coming from various sources, therefore it’s necessary to capture data from multiple devices. Since the use of devices is increasing with the time the volume of data is increasing with the time. Data volume is scaling faster than computer resources. The challenge is that techniques and tools that are used for analytics should be able to handle it. As an example in trading systems big data sizes are constantly moving from few terabytes to several zettabytes, therefor it requires sub second response time regardless of how big the data is.

Another challenge is come up with the attribute of velocity. Analytical platforms should be able to handle the speed of data creating and updating the existing data. This issue mainly applies to machine generated data such as data that generated by sensing or mobile devices which are distributed everywhere. This challenges both the storage layer and the query processing layer. These two layer should be scalable and fast. Since the data are coming from different sources, the type of the data differ from one to another. That data may be in different format and model. In order to provide more information to solve a problem or to provide better service we need to capture different types of data. Therefore big data analytical tools and techniques should be able update and adapt to the new formats of data. Unstructured data cannot be effectively understood or efficiently processed when it is in raw format. The big unstructured data are processed through solutions on the extracted structured data. The extracted data are summaries and sketches of the original unstructured data [4]. After the transformation there is information loss. Therefore it is an issue in big data analytics. Instead of operating on complex and large raw data directly, big data reduction tools enable the execution of various data analytics and managing tasks [4]. Therefore it is challengeable to implement reduction tools which has minimum effect of data loss. In present, due to the limitation of artificial intelligence there can be some errors when analyzing the data. To overcome this, researchers suggest the use of user feedback in order to update the systems. In order to analysis big data it needs large processing power. Therefore it is one major issue in big data analytics. The larger the data, the time taken for analyzing is increasing. The privacy of data is another huge concern which is most commonly available in any scenario. Most of the data generated from the organizations are associated with the decision making process and have sensitive information. Therefore security is major concern in big data. Sometimes companies refuse to outsource data storing and analytics, and tend to do them in house. By keeping the data on one place, it will be an advantage for attackers, so database storages require authentication and cryptographically secure communication framework. In addition, Intel IT Center specify obstacles of big data as: security concerns, capital/operational expenses, increased network

bottlenecks, shortage of skilled data science professionals, unmanageable data rate, data replication capabilities, lack of compression capabilities, greater network latency and insufficient CPU power [11]. Researchers are saying that most challengeable thing in big data analytics is designing most appropriate business models and algorithms. In the last few decades, researchers have developed high performance models and algorithms for big data analytics. However, multidimensional spare data, uncertain data and incomplete data can distort the results of current algorithms. And advanced algorithms are required to tackle un-structured and semi structures data. Therefore organizations are facing a challenge of selecting most efficient and effective algorithms when analyzing the big data. However, because of its noisy, dynamic, heterogeneous, inter-related and untrustworthy properties the analyzing of big data is challengeable. And also in order to use the big data in decision making process it needs deep analysis. Therefore it needs to implement deep big data analytical tools and it is somewhat challenging. ACKNOWLEDGMENT I would like to pay my sincere gratitude to supervisor of Independent Studies, Mrs M. B. Mufitha., for her advice on approaching the problem and guidance and valuable suggestions which inspired me on this accomplishment. Her wide knowledge and logical way of thinking have been of great value for me. Her understanding, encouraging and personal guidance have provided a good basis for the present this review paper. Also I would like to thank each and every person who helped me to write this review paper .Finally my gratitude goes to for Faculty of Information Technology, University of Moratuwa ,for encouraging me to do this review paper as an undergraduate.

REFERENCES [1] Punam Bedi, Vinita Jindal and Anjali Gautam, “Beginning With Big Data Simplified”, Department Of Computer Science, University Of Delhi, 2014. [2] Hsinchun Chen, Roger H. L. Chiang and Veda C. Storey, “Business Intelligence and Analytics: From Big Data to Big Impact”, Business

Intelligence MIS Quarterly Vol. 36 No. 4, Pp. 1165-1188/December 2012 [3] O’Reilly Media, Inc, “Big Data Now”, 1st Ed, USA, O’Reilly Media, Inc, 2012 [4] Jinchuan Chen, Yueguo Chen, Xiaoyong Du, Cuiping Li, Jiaheng Lu, Suyun Zhao, Xuan Zhou, “Big Data Challenge: A Data Management Perspective, Key Laboratory Of Data Engineering And Knowledge Engineering”, School Of Information, Renmin University Of China. 2013 [5] Hongye Zhong, Jitian Xiao and Xiangxin Zheng “Enhance Enterprise Android Application Security with Cloud Computing and Big Data Analytics”, Ucweb Inc, 2014 [6] Liangwei Zhang, “Big Data Analytics for Emaintenance Modeling of High-Dimensional Data Streams”, Operation and Maintenance Engineering, Luleå University of Technology, 2015 [7] Jainendra Singh, “Real Time BIG Data Analytic: Security Concern and Challenges with Machine Learning Algorithm”, Department A/Computer Science Maharaja Surajmal institute, 2014 [8] Paul C.Zikopoulos, Chris Eaton, Drik Deroos, Thomas Deutsch, George Lapis, “Understanding Big Data”, USA, McGraw-Hills, 2012 [9] Oracle Corporation “Big Data Analytics: Advanced Analytics in Oracle Database”, Oracle White Paper March 2013 [10] Revolution Analytics “Advanced ‘Big Data’ Analytics with R and Hadoop”, 2011 [11] Seref Sagiroglu, Duygu Sinanc, “Big Data: A Review”, Gazi University, Department of Computer Engineering, Faculty of Engineering, Ankara, Turkey, 2013 [12] Vinay Kumar Jain and Shishir Kumar, “Big Data Analytic Using Cloud Computing”, Second International Conference on Advances in Computing and Communication Engineering, 2015 [13] Marcos D. Assunção , Rodrigo N. Calheiros ,Silvia Bianchi , Marco A.S. Netto , Rajkumar Buyyab, “Big Data computing and clouds: Trends and future directions”, May 2014 [14] Divyakant Agrawal Sudipto Das Amr El Abbadi, “Big Data and Cloud Computing: Current State and Future Opportunities”, Department of Computer Science, University of California, USA, March 2011 [15] Karthik Kambatla, Giorgos Kollias, Vipin Kumar, Ananth Grama, “Trends in big data analytics”, J. Parallel Distrib. Comput. 2014

Lihat lebih banyak...

Big Data Analytic Techniques and Issues

Descrição do Produto

Comentários