Search this site
Embedded Files
fabiopressi.it
  • Home
  • Articoli
    • Mobilita
      • Mobilita Sostenibile
      • Come cambia la mobilità
      • Mobility Data Lab
      • La via della seta elelttrica
    • AI
      • Considerazioni AI
        • AI ML DL
  • Lezioni
    • Contents
    • Trends of a Connected World
      • SIM nel mondo
      • Slide Lesson 1
    • Digital Transformation
      • Digital Transformation
    • Internet of Things
      • Connected world
        • La penetrazione delle SIM nel mondo
      • Pagine Lesson 3
    • Big Data
      • Data preparation
      • Discovery
      • Data Analytics Lifecycle
      • Big Data Overview
      • Big Data Analytics
      • Emerging Big Data Ecosystem
      • Understanding the Growth of Data
      • State of the practice in analytics
    • Mobility
      • Mobility - Introduction
      • AMOD
      • Shared Mobility
      • E-Mobility
      • SMART CITY
    • Proximity
      • Indoor Location
      • 3 . Pro - Contact Tracing
      • Location Data Sources
      • Digital Mapping
      • DDR
      • CDR
      • Basics of Location Data
    • Marketing
      • App-Based Marketing
      • Marketing & Data
      • Turism
      • 5 fasi del CX
    • Resources
      • Model planning
      • Model execution
      • Communicate result
      • Data Mining
      • Basic Data Analytic Methods
      • Data Visualization
      • Advanced Data Analytic Methods
      • Technlogy & Tools
      • Artificial Intelligence
      • Use Cases & Applications
      • Nurturing Lead
      • Course 2020
      • Course 2019
      • Course 2018
      • Q&A
      • Trends
      • Remember...
      • Acronyms
      • Readings
      • Value Creation Model
      • Overview onBig Data Analytics
      • Create Value from Data
      • Big data become the norm, but...
      • Bid Stream
      • Geosocial Data
      • Embedded Devices
      • RPC
  • Contatti
fabiopressi.it
  • Home
  • Articoli
    • Mobilita
      • Mobilita Sostenibile
      • Come cambia la mobilità
      • Mobility Data Lab
      • La via della seta elelttrica
    • AI
      • Considerazioni AI
        • AI ML DL
  • Lezioni
    • Contents
    • Trends of a Connected World
      • SIM nel mondo
      • Slide Lesson 1
    • Digital Transformation
      • Digital Transformation
    • Internet of Things
      • Connected world
        • La penetrazione delle SIM nel mondo
      • Pagine Lesson 3
    • Big Data
      • Data preparation
      • Discovery
      • Data Analytics Lifecycle
      • Big Data Overview
      • Big Data Analytics
      • Emerging Big Data Ecosystem
      • Understanding the Growth of Data
      • State of the practice in analytics
    • Mobility
      • Mobility - Introduction
      • AMOD
      • Shared Mobility
      • E-Mobility
      • SMART CITY
    • Proximity
      • Indoor Location
      • 3 . Pro - Contact Tracing
      • Location Data Sources
      • Digital Mapping
      • DDR
      • CDR
      • Basics of Location Data
    • Marketing
      • App-Based Marketing
      • Marketing & Data
      • Turism
      • 5 fasi del CX
    • Resources
      • Model planning
      • Model execution
      • Communicate result
      • Data Mining
      • Basic Data Analytic Methods
      • Data Visualization
      • Advanced Data Analytic Methods
      • Technlogy & Tools
      • Artificial Intelligence
      • Use Cases & Applications
      • Nurturing Lead
      • Course 2020
      • Course 2019
      • Course 2018
      • Q&A
      • Trends
      • Remember...
      • Acronyms
      • Readings
      • Value Creation Model
      • Overview onBig Data Analytics
      • Create Value from Data
      • Big data become the norm, but...
      • Bid Stream
      • Geosocial Data
      • Embedded Devices
      • RPC
  • Contatti
  • More
    • Home
    • Articoli
      • Mobilita
        • Mobilita Sostenibile
        • Come cambia la mobilità
        • Mobility Data Lab
        • La via della seta elelttrica
      • AI
        • Considerazioni AI
          • AI ML DL
    • Lezioni
      • Contents
      • Trends of a Connected World
        • SIM nel mondo
        • Slide Lesson 1
      • Digital Transformation
        • Digital Transformation
      • Internet of Things
        • Connected world
          • La penetrazione delle SIM nel mondo
        • Pagine Lesson 3
      • Big Data
        • Data preparation
        • Discovery
        • Data Analytics Lifecycle
        • Big Data Overview
        • Big Data Analytics
        • Emerging Big Data Ecosystem
        • Understanding the Growth of Data
        • State of the practice in analytics
      • Mobility
        • Mobility - Introduction
        • AMOD
        • Shared Mobility
        • E-Mobility
        • SMART CITY
      • Proximity
        • Indoor Location
        • 3 . Pro - Contact Tracing
        • Location Data Sources
        • Digital Mapping
        • DDR
        • CDR
        • Basics of Location Data
      • Marketing
        • App-Based Marketing
        • Marketing & Data
        • Turism
        • 5 fasi del CX
      • Resources
        • Model planning
        • Model execution
        • Communicate result
        • Data Mining
        • Basic Data Analytic Methods
        • Data Visualization
        • Advanced Data Analytic Methods
        • Technlogy & Tools
        • Artificial Intelligence
        • Use Cases & Applications
        • Nurturing Lead
        • Course 2020
        • Course 2019
        • Course 2018
        • Q&A
        • Trends
        • Remember...
        • Acronyms
        • Readings
        • Value Creation Model
        • Overview onBig Data Analytics
        • Create Value from Data
        • Big data become the norm, but...
        • Bid Stream
        • Geosocial Data
        • Embedded Devices
        • RPC
    • Contatti

2. Data Analytics Lifecycle

Discovery

Data Discovery

in Phase1, the team learns the business domain, including relevant history such as whether the organization or business unit has attempted similar projects in the past from which they can learn. The team assesses the resources available to support the project in terms of people, technology, time, and data. Important activities in this phase include framing the business problem as an analytics challenge that can be addressed in subsequent phases and formulating initial hypotheses (IHs) to test and begin learning the data.

Learning the Business Domain


Understanding the domain area of the problem is essential. In many cases, data scientists will have deep computational and quantitative knowledge that can be broadly applied across many disciplines. An example of this role would be someone with an advanced degree in applied mathematics or statistics.

These data scientists have deep knowledge of the methods, techniques, and ways for applying heuris- tics to a variety of business and conceptual problems. Others in this area may have deep knowledge of a domain area, coupled with quantitative expertise. An example of this would be someone with a Ph.D. in life sciences. This person would have deep knowledge of a field of study, such as oceanography, biology, or genetics, with some depth of quantitative knowledge.

At this early stage in the process, the team needs to determine how much business or domain knowledge the data scientist needs to develop models in Phases 3 and 4. The earlier the team can make this assessment the better, because the decision helps dictate the resources needed for the project team and ensures the team has the right balance of domain knowledge and technical expertise.

Data Science and Big Data Analytics

Resources

As part of the discovery phase, the team needs to assess the resources available to support the project. In this context, resources include technology, tools, systems, data, and people.

n addition to the skills and computing resources, it is advisable to take inventory of the types of data available to the team for the project. Consider if the data available is sufficient to support the project’s goals. The team will need to determine whether it must collect additional data, purchase it from outside sources, or transform existing data. Often, projects are started looking only at the data available. When the data is less than hoped for, the size and scope of the project is reduced to work within the constraints of the existing data.

After taking inventory of the tools, technology, data, and people, consider if the team has sufficient resources to succeed on this project, or if additional resources are needed. Negotiating for resources at the outset of the project, while scoping the goals, objectives, and feasibility, is generally more useful than later in the process and ensures sufficient time to execute it properly. Project managers and key stakeholders have better success negotiating for the right resources at this stage rather than later once the project is underway.

Framing the Problem

At this point, it is a best practice to write down the problem statement and share it with the key stakeholders. Each team member may hear slightly different things related to the needs and the problem and have somewhat different ideas of possible solutions. For these reasons, it is crucial to state the analytics problem, as well as why and to whom it is important. Essentially, the team needs to clearly articulate the current situation and its main challenges.

Identifying Key Stakeholders

Another important step is to identify the key stakeholders and their interests in the project. During these discussions, the team can identify the success criteria, key risks, and stakeholders, which should include anyone who will benefit from the project or will be significantly impacted by the project. When interviewing stakeholders, learn about the domain area and any relevant history from similar analytics projects. For example, the team may identify the results each stakeholder wants from the project and the criteria it will use to judge the success of the project.

Interviewing the Analytics Sponsor

The team should plan to collaborate with the stakeholders to clarify and frame the analytics problem. At the outset, project sponsors may have a predetermined solution that may not necessarily realize the desired outcome. In these cases, the team must use its knowledge and expertise to identify the true underlying problem and appropriate solution.

Developing Initial Hypotheses

Developing a set of IHs is a key facet of the discovery phase. This step involves forming ideas that the team can test with data. Generally, it is best to come up with a few primary hypotheses to test and then be creative about developing several more.

Another part of this process involves gathering and assessing hypotheses from stakeholders and domain experts who may have their own perspective on what the problem is, what the solution should be, and how to arrive at a solution. These stakeholders would know the domain area well and can offer suggestions on ideas to test as the team formulates hypotheses during this phase. The team will likely collect many ideas that may illuminate the operating assumptions of the stakeholders. These ideas will also give the team opportunities to expand the project scope into adjacent spaces where it makes sense or design experiments in a meaningful way to address the most important interests of the stakeholders. As part of this exercise, it can be useful to obtain and explore some initial data to inform discussions with stakeholders during the hypothesis-forming stage.

Identifying Potential Data Sources

As part of the discovery phase, identify the kinds of data the team will need to solve the problem. Consider the volume, type, and time span of the data needed to test the hypotheses. Ensure that the team can access more than simply aggregated data. In most cases, the team will need the raw data to avoid introducing bias for the downstream analysis.

  • Identify data sources:

  • Capture aggregate data sources:

  • Review the raw data:

  • Evaluate the data structures and tools needed:

  • Scope the sort of data infrastructure needed for this type of problem:


Data preparation >


Fabio Pressi - Only for fun Update 5/1/2025
Google Sites
Report abuse
Page details
Page updated
Google Sites
Report abuse