Big data grow over volume, variety, and velocity alone. You need to know the 10 characteristics and properties of big data to formulate for both the challenges and returns of big data initiatives. Veracity is one of the unfortunate characteristics of big data. As all the above characteristics increase, veracity (confidence, trust in data) drops. The term veracity is defined as one of the 5vs of big data which indicates trust, quality and the credibility of the data that a company or organization has composed to gain perfect insights for the perfect decision-making system.
Data creation and
consumption is becoming a way of life. According to the recent IBM research
report, 2.5 quintillion bytes of data are produced globally in a day in 2017. Analysts
predict that, over 1.7 megabytes of new information will be produced by every
person in the world in every second. Generally speaking, all applications of
cleansing, profiling, transformation, discovering etc, should be in terms of
data that is captured or extracted from the web. Each website should be treated
as a source and you should use language from that stand point rather than the
traditional data integration slant on enterprise data management and data from
traditional sources. Only after the data source
has been analyzed and installed can data processing continue. Data purification
relies on complete and continuous data profiles to identify data quality issues
to be addressed.
Mostly in general, data veracity is a degree
of accuracy or truthfulness of a set of data. In case of big data its not just
the quality of data which are important but it’s about how trustworthy is the
source, data type and data processing are.
Sources of data veracity
Figure:
datafloq weekly digest
What is data transformation?
Data transformation is the
process which change the format structure and value of data. For the projects
of data analytics data may transform at two stages of data pipelines. The
organizations which have on premises data warehouse normally use ETL (extract,
transform, load) procedures and data transformation is one of the steps in this.
Now a days, organizations mainly use cloud-based data warehouses which is more
advanced. This can scale compute and storage and measure within a short time.
This process can skip the preload transformation and low data into a data warehouse
and then transform at the requested time.
Benefits of data
transformation
·
Transformation
of data will make it well organized. Also transformed data may also be simple
for both for humans and computers to use.
·
Properly
formatted and validated data helps to develop applications from null values,
duplicates which unexpectedly happened, incorrect indexing etc.
·
Data
transformation enables performance between applications, systems and data types.
Author: Harikrishnan H
Keywords
#big data
#veracity
# data cleansing
#data transformation
#ETL
Reference
Datafloq.com.
2021. Data Veracity: a New Key to Big
Data. [online] Available at: <https://datafloq.com/read/data-veracity-new-key-big-data/6595>
[Accessed 7 March 2021].
Import.io. 2021. What is Data Cleansing and Transformation/
Wrangling? | Import.io. [online] Available at:
<https://www.import.io/post/what-is-data-cleansing-and-transformation-wrangling/>
[Accessed 7 March 2021].
Stitch. 2021. What is data transformation: definition,
benefits, and uses | Stitch resource. [online] Available at:
<https://www.stitchdata.com/resources/data-transformation/> [Accessed 7
March 2021].
Transforming Data with
Intelligence. 2021. The 10 Vs of Big
Data | Transforming Data with Intelligence. [online] Available at:
<https://tdwi.org/articles/2017/02/08/10-vs-of-big-data.aspx> [Accessed 7
March 2021].
Good work Hari. Your ideas about big data and data transformation are really helpful
ReplyDeleteVery helpful and informative as well
ReplyDeleteGreat work 👍
ReplyDeleteYou deserve a great appreciation for this Good job.💯
ReplyDeleteGood job
ReplyDeleteWell done
Good narration and more accurate piece of information
ReplyDeleteGreat work 👏👏👏
ReplyDeleteVery informative! Looking forward to more contents.
ReplyDeleteOf course Veracity is important as velocity, volume, variety of big data. Consumers and companies need to know how trustworthy the data is. Since companies only wanted to store and mine the data relevant to solving the problems, data scientists would need to be extremely careful by keeping those biases and noise out of the dataset being analyzed. The generated big data is highly abnormal and inconsistent, making it difficult for enterprises to make sense of it and build trust in their data. Trust is a big factor for any businesses which I completely understand by this blog post. keep it up!
ReplyDeleteGreat work! This process is crucial and emphasized because wrong data can drive a business to wrong decisions, conclusions, and poor analysis, especially if the huge quantities of big data are into the picture.
ReplyDeleteWell written Harikrishnan… I
ReplyDeletet is very much true that data veracity is the degree to which a data collection is reliable or truthful. Adding to what you have written in the blog, I wish to share a couple of insights more to it. When it comes to big data accuracy, it's not only about the data's quality; it's also about how reliable the data source, type, and processing are. Improving the accuracy of big data requires removing bias, anomalies or discrepancies, replication, and uncertainty, to name a few factors. Volatility is out of our reach at times, unfortunately. The rate of change and lifespan of the data are referred to as volatility, or another "V" in big data. Social networking, where sentiments and trending topics shift rapidly and often, is an example of extremely volatile data. Weather patterns, which shift less often and are easier to forecast and monitor, are an example of less volatile data.
-- Thomas Devasia
Good work Hari
ReplyDeleteExpecting more from you
Im really amused to see such a informative blog
ReplyDeleteIts really good information
ReplyDelete