Use Data Visualization for

Aman S
3 min readJul 25, 2022

Data Visualization is a key skill for a Data Analyst or Data Scientist.

No one likes to see the just the numbers in your project, people understand more about data using Visualization Techniques.

The ability to take data-to be able to understand it, to process it, to extract value from it, to visualize it, to communicate it-that’s going to be a hugely important skill in the next decades. — Hal Varian (Google’s Chief Economist)

Data Quality, Data Exploration and Data Presentation. (key skills)

  1. Data Quality: Explore your data quality which identify outliers.
  2. Data Exploration: Understand data with visualizing ideas
  3. Data Presentation: Present your results and be good at story telling about the data you worked on.

Matplotlib is an easy to use visualization library for Python.

What does Visualization gives:

  1. Absorb information quickly
  2. Improvise insights
  3. Make faster decisions.

Data Quality::

Is your data quality usable.
Do not forget to check whether the data has missing values or not.

Check for missing values

isna().any()

The above functionality checks for any missing values. returns True if there are missing values else false.

I took a sample dataset which have information about heights.

We can observe there are no missing values in that.

Identifying outliers

Outliers are data points that are far from other data points. In other words, they’re unusual values in a dataset. Outliers are problematic for many statistical analyses because they can cause tests to either miss significant findings or distort real results.

Few ways for finding outliers

  1. Sorting your Datasheet yo find Outliers.
data_sorted= data.sort_index(ascending=True)
data_sorted

2. Use graphs

data.plot.hist()

ANOTHER EXAMPLE

3. Using the Interquartile Range to Create Outlier Fences

After finding the Outliers try to delete that coulmn or row from the data.

--

--