Big data is a
collection of large datasets that cannot be processed using traditional
computing techniques. Testing of these datasets involves various tools,
techniques and frameworks to process. Big data relates to data creation,
storage, retrieval and analysis that is remarkable in terms of volume, variety,
and velocity. The Oniyosys
Big Data Testing Services Solution offers end-to-end
testing from data acquisition testing to data analytics testing.
Big Data Testing Strategy
Testing Big Data application is more a verification of its data processing rather than testing the individual features of the software product. When it comes to Big Data Testing, performance and functional testing are the key.
In Big data testing QA engineers verify the successful processing of terabytes of data using commodity cluster and other supportive components. It demands a high level of testing skills as the processing is very fast. Processing may be of three types
1. Batch
2. RealTime
3. Interactive
Along with this, data quality is also an important factor in big data testing.
Before testing the application, it is necessary to check the quality of data
and should be considered as a part of database testing. It involves checking
various characteristics like conformity, accuracy, duplication, consistency,
validity, data completeness, etc.
Testing Steps in verifying Big Data Applications
The following figure gives a high level overview of phases in Testing Big Data Applications
Step 1: Data Staging Validation
- The first step of bigdata testing, also referred as pre-Hadoop stage involves process validation.
- Data from various source like RDBMS, weblogs, social media, etc. should be validated to make sure that correct data is pulled into system
- Comparing source data with the data pushed into the Hadoop system to make sure they match
- Verify the right data is extracted and loaded into the correct HDFS location
- Tools like Talend, Datameer, can be used for data staging validation
Step 2: "MapReduce" Validation
The second step is a
validation of "MapReduce". In this stage, the tester verifies the
business logic validation on every node and then validating them after running
against multiple nodes, ensuring that the -
- Map Reduce process works correctly
- Data aggregation or segregation rules are implemented on the data
- Key value pairs are generated
- Validating the data after Map Reduce process
Step 3: Output Validation Phase
The final or third stage of Big Data testing is the output validation process. The output data files are generated and ready to be moved to an EDW (Enterprise Data Warehouse) or any other system based on the requirement.
Activities in third stage includes
- To check the transformation rules are correctly applied
- To check the data integrity and successful data load into the target system
- To check that there is no data corruption by comparing the target data with the HDFS file system data
- Architecture Testing
Hadoop processes very large volumes of data and is highly resource intensive.
Hence, architectural testing is crucial to ensure success of your Big Data
project. Poorly or improper designed system may lead to performance
degradation, and the system could fail to meet the requirement. At least,
Performance and Failover test services should be done in a Hadoop environment.
Tools used in Big Data Scenarios
NoSQL: CouchDB, DatabasesMongoDB, Cassandra, Redis, ZooKeeper, Hbase
MapReduce: Hadoop, Hive, Pig, Cascading, Oozie, Kafka, S4, MapR, Flume
Storage: S3, HDFS ( Hadoop Distributed File System)
Servers: Elastic, Heroku, Elastic, Google App Engine, EC2
Processing: R, Yahoo! Pipes, Mechanical Turk, BigSheets, Datameer
Challenges In Big Data Testing:
1.Huge Volume and
Heterogeneity
Testing a huge volume
of data is the biggest challenge in itself. A decade ago, a data pool of 10
million records was considered massive. Today, businesses work with few Petabytes
or Exabytes data, extracted from various online and offline sources, to conduct
their daily business. Testers are required to audit such voluminous data to
ensure that they are a fit for business purposes. It is difficult to store and
prepare test cases for such large data that is not consistent. Full-volume
testing is impossible due to such a huge data size.
2. Understanding the
Data
For the Big Datatesting strategy to be effective, testers need to continuously monitor and
validate the 4Vs (basic characteristics) of Data – Volume, Variety, Velocity
and Value. Understanding the data and its impact on the business is the real
challenge faced by any Big Data tester. It is not easy to measure the testing
efforts and strategy without proper knowledge of the nature of available data.
3. Dealing with
Sentiments and Emotions
In a big-data system,
unstructured data drawn from sources such as tweets, text documents and social
media posts supplement a data feed. The biggest challenge faced by testers
while dealing with unstructured data is the sentiment attached to it. For
example, consumers tweet and discuss about a new product launched in the
market. Testers need to capture their sentiments and transform them into
insights for decision making and further business analysis.
4.Lack of Technical
Expertise and Coordination
Technology is
growing, and everyone is struggling to understand the algorithm of processing
Big Data. Big Data testers need to understand the components of the Big Data
ecosystem thoroughly. Today, testers understand that they have to think beyond
the regular parameters of automated testing and manual testing. Big Data, with
its unexpected format, can cause problems that automated test cases fail to
understand. Creating automated test cases for such a Big Data pool requires
expertise and coordination between team members. The testing team should
coordinate with the development team and marketing team to understand data
extraction from different resources, data filtering and pre and post processing
algorithms. As there are a number of fully automated testing tools available in
the market for Big Data validation, the tester has to possess the required
skill-set inevitably and leverage Big Data technologies like Hadoop. It calls
for a remarkable mind set shift for both testing teams within organizations as
well as testers. Also, organizations need to be ready to invest in Big
Data-specific training programs and to develop the Big Data test automation
solutions.
At
Oniyosys, we conduct detailed study of current and new data requirements and
apply appropriate data acquisition, data migration and data integration testing
strategies to ensure seamless integration for your Big Data Testing.
Very useful post and I think it is rather easy to see from the other comments as well that this post is well written and useful. I bookmarked this blog a while ago because of the useful content and I am never being disappointed. Keep up the good work..
ReplyDeletesoftware testing company
QA Outsourcing Sevices
Performance testing Services