Different names for data analysis

Welcome to the Shape of Data blog. Over the next few months, I plan to write a number of posts illustrating how understanding the geometry behind data analysis can lead to deeper insights and a more intuitive understanding of the data. But before we delve into geometry, it might be a good idea to discuss the many different names that are often used to describe the analysis of large data sets.

Most of these names have become essentially synonyms of each other, but they all have different origins and nuances. In academia, at least, a researcher’s intellectual background has a big impact on who they know and work with, what conferences they go to and what terminology they use. I don’t know how much of a difference it makes outside the university setting, but regardless, I think it’s a good idea to know the subtle implications of these different terms.

Machine Learning – This term comes from computer science, particularly artificial intelligence. Originally, the idea was to have a computer “learn” a pattern from a data set, then make decisions about new situations based on what it had learned. This is now called supervised learning, or classification, and machine learning has expanded to a much wider range of types of data analysis.

Data Mining – This term originally referred to a subfield of statistics. In some sense, the goal of all statistics is to analyze and summarize data, but data mining is (or at least was originally) the field of statistics that focused on large, high dimensional data sets.

Signal Processing – As the name implies, the engineering field of signal processing is the study of how to encode and decode signals. The decoding part is where the data analysis comes in, since you can think of a data set as a (possibly noisy) signal. Many of the techniques that have become standard in data analysis have their roots in signal processing.

Knowledge discovery – Short for Knowledge Disovery in Databases (KDD), this term refers to a multi-step process in which data is accumulated in a database, analyzed then interpreted. Technically, data mining/machine learning is just one step in this process. KDD is often associated with relational databases (such as MySQL) as opposed to the newer and less structured NoSQL storage methods (such as Hadoop) usually associated with “Big Data” (see below.)

(Business) Analysis/Analytics/Intellegence – As suggested by what’s in the parentheses, these terms refer specifically to the use of data analysis in business. There’s plenty of confusion about what each term means specifically, but the general consensus seems to be that analytics refers to the computational part (the processing of data) and analysis is the human part (interpreting the data and making decisions based on it.) Intelligence refers mostly to accumulating and organizing the data, though some sources suggest that business intelligence (BI) can also refer to the whole process – gather, process, interpret – similar to knowledge discovery.

Data Science – Recently, “Data Scientist” has become a popular job title for companies looking for technical experts with interdisciplinary backgrounds. A number of people have pointed out that this is kind of a misnomer, since by definition, all scientists study data. If it were up to me, I would have used a term more like “Business science”, since data science usually means the application of data analysis techniques to business problems. But I’ll admit that “business science” doesn’t sound as cool as “data science”.

Big Data – This term generally refers to the challenges and promises associated with larger and larger data sets. But it was also (at least originally) meant to allude to the term “Big Oil”, suggesting the massive corporations that have and will continue to exploit this new resource. In practice, much of the technology associated specifically with “Big Data” (such as Hadoop) is designed for accumulating, storing and dispensing unstructured data. To gain insights from this data, it is often necessary to extract a structured form of data via the Big Data machinery, then analyze it using “small data” techniques.

Informatics – Since Bioinformatics is application of data analysis to biology (particulary molecular biology), it seems like informatics should also be a synonym of data science. However, informatik is the German word for Computer Science, so would get confusing if we tried to use it in the specific sense of data analysis. However, it does seem to occasionally get used in that sense.

Are there any others that I missed? These were all the words I could think of, but I’m sure there are more. If you know of any, please let me know in the comments.

This entry was posted in Introduction. Bookmark the permalink.

12 Responses to Different names for data analysis

  1. Rick Wicklin says:

    For many of these areas, the goal is to predict or model something. I use the term statistical data analysis to also include inference: the ability to estimate uncertainty in estimates.

    • That’s a good point. I’m afraid I don’t know much of the statistics behind estimating uncertainty and finding error bars, but this blog will give me a good excuse to learn. I’ll have to find a way to fit that into the discussion. Thanks!

  2. Dinesh Agrawal says:

    But these all are different components and as per need they must be used in an organisation.
    nice article

  3. Pingback: Conker: Predict the Future | Instigation of Thought

  4. Pingback: Predictive Behavioral Game Analytics · Conker.io

  5. Pingback: The shape of data | spider's space

  6. Kart says:

    1) You didn’t include “statistics” in your list (!), nor “information analysis”, nor “data analysis”. It would be interesting, however, to discuss the origin of these terms too, even if they are the most common. You might also want to include “econometrics” and other “metric” siences (bibliometrics, environmetrics, sociometrics…).
    2) In French, “analyse de donnée” (litt. data analysis) is much more specific than in English and is restricted to the meaning of “factorial analysis” (if it is ever interesting). Data analysis might be translated by “analyse de l’information”.
    3) In Danish, there is no distinction between “informatics” and “data science” (!) : both are rendered by “datalogi” (litt. “sience of data”).

    • Good point about statistics. I left it off the list because I wanted to focus on fields that were specifically focused on complex, large data sets. Statistics is a much broader field and is represented on the list by data mining, which (from what I understand) started out as a subfield of statistics. So I left statistics out for the same reason that I left out computer science, math and engineering.

      As for ‘data analysis’ and ‘information analysis’, I always think of these terms as referring to anything and everything on this list. (That’s how I use them, at least.) Since I didn’t know of specific meanings or connotations, I left them off.

      That’s interesting about the French and Danish terms. I guess there’s no point in trying to make the English terminology compatible with other languages, since other languages aren’t consistent with each other!

      Thanks, Kart.

  7. Guru Medasani says:

    Predictive Analytics?

  8. Pingback: DataRox

  9. Pingback: Different names for data analysis _ the shape of data artificial intelligence course free

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s