Slice and dice, the first step in preparing a recipe

To handle the vast amount of data available today, it’s necessary to slice and dice or break information down, examining it from different viewpoints until you get the right level of detail and understanding. This is the basic concept that underpins the work of data scientists.

While basic, the skill level is akin to that of a chef. Chefs need a good set of knives with different edges to slice and dice their ingredients: to prepare the ingredients for a recipe.


finding consistent patterns in a language is difficult, so deciding where to slice and dice is tricky. Depending on thier use, words, phrases, and sentences can have the same or different meanings.

The tools for slicing and dicing vary from basic to highly advanced, so it can be challenging to know which ones to use.

Importantly, it’s hard to envisage the ‘recipe’ you are trying to create as textual data is rich and can generate numerous insights.

Let’s look at four techniques that dice words and slice phrases from these two sample texts:


🔪 Keyword extraction:

Keywords are sequences of one, two, or three words in a text, filtered using statistical methods like frequency counting and/or Pointwise Mutual Information (PMI). The keywords help to identify what is most talked about in the text.

Eg: identifying the keywords from the sample text 1

Keywords identified from the sample text 1

🔪 Topic modeling:

This technique is used in a large corpus of data to identify groups of words with a similar context (based on the statistical probability of occurring together). Then the best topic name is assigned from what is observed. This can be subjective; for example, ‘Topic 3’ below could be called ‘Pet problems'.


🔪 Noun phrases:

These are phrases built around a noun and preceded by a ‘determiner’ (such as ‘a,' ‘the,' ‘some,' ‘this’). Expanded noun phrases give more detail and contain adjectives, such as ‘male,' ‘huge,' and ‘colorful’.

🔪 Dependency grammar:

This process captures the relationship between words in a sentence or piece of text using the grammatical structure of that text. Let’s take a closer look at the diagram and the phrase “cat loves the taste”

Grammar pattern diagram

The ROOT word is loves and the pattern, generated by combining serial relationships, “nsubj ROOT dobj” gives us the phrase “cat loves the taste.” This is one of the most valuable patterns and is like SUBJECT VERB OBJECT relationship in English Grammar. So, we can see now how combining the right set of dependency relations can give us meaningful phrases from the text.

There can be thousands of possible patterns in a corpus of mixed-sized text data, but not all are relevant or produce meaningful phrases.

From the above two sample texts, we get 1722 patterns, of which only 35 provide meaningful phrases! And even then, you must find a way to group similar ideas.

Lexical semantics

Slicing and dicing in textual analytics are known as ‘lexical semantics’ – deconstructing words and phrases within the text. To go back to our chef analogy, In themselves, these words and phrases don’t provide meaning in context or insights (difference between apples and apple pie).

Difference between apples and apple pie
Difference between apples and apple pie

For many analytics providers, this is the start and end of insights. But in fact, it’s just the beginning. From here, the challenges go beyond the technical ones – it’s about what has been referred to as ‘the last mile’ and requires human engagement, labeling, and interpretation.

The 2020 GRIT report bore this out:

“What is striking is that both Buyers and Suppliers equally indicate that business knowledge (68% for both) is a high priority skill their organization needs and technical/computer expertise is the lowest (28% for Buyers and 42% Suppliers).”

‘The last mile’ hasn’t evolved as rapidly as the technical parts of NLP; many analytics providers reach a certain point and offer what they ‘conveniently’ position as a ‘self-service’ or ‘human in the loop’ service. By doing this, they are effectively offloading the last mile as they lack the subject expertise (like the chef’s knowledge) to complement data science efforts.


Data scientists + subject matter experts = clear recipe + great final dish (insights)

Thanks for reading through to the end.🐱

If you’d like to learn more about how we go further, please read our article, “Demystifying AI”.

