Big Data


Feature Selection

Feature Selection in Data Mining

In Machine Learning and statistics, feature selection, also known as the variable selection is the operation of specifying a division of applicable features for apply in form of the model formation. The center basis after operating an element collection approach so as to the data hold a number attributes. It is an algorithm can be seen as the grouping of a search procedure for proposes original attribute subsets, along with...

Read More


Hadoop structure

Hadoop Introduction

The Hadoop Distributed File System (HDFS) is a distributed file system designed to run on commodity hardware. It has many similarities with existing distributed file systems. However, the differences from other distributed file systems are significant. HDFS is highly fault-tolerant and is designed to be deployed on low-cost hardware. HDFS provides high throughput access to application data and is suitable for applications that have large...

Read More


data flow diagram word count problem

Hadoop and Word Count | Hadoop Distributed File System

Hadoop is an Apache open source framework written in Java that allows distributed processing of large datasets across clusters of computers using simple programming models. A Hadoop frame-worked application works in an environment that provides distributed storage and computation across clusters of computers. Hadoop is designed to scale up from a single server to thousands of machines, each offering local computation and storage. Hadoop Architecture...

Read More