Monday, 24 November 2014

Week 6 (24.11-30.11.24): Big Data

Big data is a vague term for a massive phenomenon that has rapidly become an obsession with entrepreneurs, scientists, governments and the media. It is the hottest topic in IT next to the cloud. Moreover, these two ideas are closely related as the massive growth in the quantity of data has been generated through cloud computing.


The continuous increase in the volume and detail of data captured by organizations, such as the rise of social media, Internet of Things (IoT) and multimedia, has produced an overwhelming flow of data in either structured or unstructured format. Data creation is occurring at a record rate, referred to herein as big data, and has emerged as a widely recognized trend. Big data has gained attention from the academia, government and industry.


Big data can be characterized by three aspects:
(a)data are numerous,
(b)data cannot be categorized into regular relational databases,
(c)data are generated, captured, and processed rapidly.


Moreover, big data is transforming healthcare, science, engineering, finance, business, and eventually, the society. The advancements in data storage and mining technologies allow for the preservation of increasing amounts of data described by a change in the nature of data held by organizations. The rate at which new data are being generated is staggering. A major challenge for researchers and practitioners is that this growth rate exceeds their ability to design appropriate cloud computing platforms for data analysis and update intensive workloads.


Please read the following article and answer the questions.


R. L. Villars, C. W. Olofson, M. Eastwood, Big data: what it is and why you should care, White Paper, IDC, 2011, MA, USA.

http://sites.amd.com/fr/Documents/IDC_AMD_Big_Data_Whitepaper.pdf


1. We gather more and more data, but does it provide in your opinion much more useful information?
2. What IT challenges related to big data can you think of?
3. Do you see any threats related to big data?

29 comments:

  1. We gather more and more data, but does it provide in your opinion much more useful information

    Yes but we must think about how we can do that.
    Recording and organizing data may take different forms, depending on the kind of information you’re collecting. The way you collect your data should relate to how you’re planning to analyze and use it. Regardless of what method you decide to use, recording should be done concurrent with data collection if possible, or soon afterwards, so that nothing gets lost and memory doesn’t fade

    ReplyDelete
  2. Do you see any threats related to big data?

    Here are the main threats, internal and external, which weigh on the data:

    - accidental or intentional violations of safety rules by staff or providers
    - organized crime and targeted industrial espionage
    - loss of data, including those stored in a bad record or unintentionally deleted
    - Programs / malware and sophisticated computer virus
    security mechanisms (eg. firewalls) failed or obsolete
    - the inability to produce the required information for audit or similar event
    - natural disasters and hardware failures

    ReplyDelete
  3. Q1. We gather more and more data, but does it provide in your opinion much more useful information?
    I divide answer to the question into two fields. The first one is private data and the second one business data. In my opinion growing number of private data will not provide any useful information. Having colected terra bajts of images, movies we have only troubles with archiving system because we don`t want to delete them.
    On the other hand more and more data in business can transfer into making decisions process. The key point is to built a suitable data model and methods of data analysis.

    ReplyDelete
    Replies
    1. Notice, that in the era of cloud storage, private data is not private anymore.

      Delete
  4. Q2. What IT challenges related to big data can you think of?
    Having read the paper I think there are many IT challenges. The first one are new data models, the second one modern data storage architectures. Having changed the mentioned fields a new architectures of PC, mobile devices and new methods of data analysis will be needed.

    ReplyDelete
  5. This comment has been removed by the author.

    ReplyDelete
  6. Q3. Do you see any threats related to big data?
    In my opinion the most important aspect of big data volumes is the data security policy. The wider database architecture (cloud computing) the more data security threats.

    ReplyDelete
  7. Q1. We gather more and more data, but does it provide in your opinion
    much more useful information?

    I used to built risk models so I have to say that it is true. The more data we can collect, the better optimization of any occurrance we can make. Moreover, the more differential data we use, the more sophisticated metods of econometric modeling can be used.

    ReplyDelete
  8. Q2. What IT challenges related to big data can you think of?

    I think about providing the data to a client in a user-friendly way. Handling with large amount of data is a challenge for Business Inteligence. The raw data need to be easily transformed into useful and convincing information in order to support business decisions.

    ReplyDelete
    Replies
    1. That is an interesting insight - most opinions oscilate around data storage and computation.

      Delete
  9. Q3. Do you see any threats related to big data?

    The main threat that I can see is loss or be robbed of sensitive data such as ID, address, date of birth. This information is used by fraudsters to create false identity.

    ReplyDelete
  10. 1. We gather more and more data, but does it provide in your opinion much more useful information?

    I think all depends on software developers. More detailed model of objects should generate more useful report. There is of course case in neural network world that too much complicated neural network architecture generates over trained result. Network produce outcomes in sophisticated queries but false to answer simple ones. However analyzing databases is something different than developing successful artificial neural networks. This is an oportunity for the IT to build more complex software.

    2. What IT challenges related to big data can you think of?

    I can imagine that creating software is a challenge. We can’t rely only on Hadoop and Terradata. In future there will be more options like in the SQL relational databases.

    3. Do you see any threats related to big data?

    There is serious problem with our privacy. Today, anybody can be traced by mobile phone and it can be stored even any information of his or hers internet activity. Big data is another tool to control of people.

    ReplyDelete
  11. 1. We gather more and more data, but does it provide in your opinion much more useful information?

    That's the whole point of big data analysis - to mine useful information from all the gathered data. Otherwise the data is useless and only overwhelms a company by slowing down their operations. Big data can help to optimize the way a company functions and find new areas with potential for growth. This kind of information is very valuable.
    On the other hand, there are additional costs related to handling and analyzing big data. It might not always be worth it to deploy such systems.

    2. What IT challenges related to big data can you think of?

    The main challenge according to the article is capturing, managing and analyzing the data in an acceptable time frame. But even before that there's also the challenge to demonstrate how big data can help a company.
    The article also states that finding expertise to handle big data is also a challenge in itself, but the situation is constantly improving because of competitive pressure.

    3. Do you see any threats related to big data?

    One of the risks is being responsible for the maintenance of big data in a highly changeable technological environment. That's why sometimes outsourcing can be more beneficial to a company, because then a specialized service provider will take care of the storage and processing requirements.
    Another threat is that the volume of big data is expanding faster than our hardware capabilities as predicted by Moore's law (this is stated in the article). New architectures are necessary to achieve the required computing power.
    There's also the issue of privacy as mentioned by Piotr above. The "right to be forgotten" is one solution to this problem. A company should not be allowed to gather more data than we're comfortable sharing just because we now have the technology to acquire all this data.

    ReplyDelete
    Replies
    1. Thanks for your insights.
      I agree with you - more data doesn't always mean better information.

      Delete
  12. 1. We gather more and more data, but does it provide in your opinion much more useful information?

    In my opinion the more data we have the better. Even if we don’t know how to anylyze some kind of information, we might be able to do something useful with it in the future.

    Also, having a lot of data raises the question - what do we do with all this information? Can it be useful? Without such quantities of data we wouldn’t be asking those questions - we would be asking ourselves how can we gather more data. In general when you have an excess of something, you try to figure out what to do with it.

    It’s hard to tell if this data is providing much more useful information right now, but it has the potential of providing us with information and dependencies which we wouldn’t be able to discover on smaller amounts of data. As it’s a relatively new problem, we’re learning how to handle it and we’re probably still not very good at it. But new ideas and approaches are emerging (like Deep Learning), and hopefuly all the information will be put to good use.

    I’m not discussing the problem of privacy and data collection, as it’s a separate subject.

    2. What IT challenges related to big data can you think of?

    I think that the main challenges related to Big Data are data processing and analysis. Naturally there are other issues like storage and processing power, but after reading the article I’m reassured that AMD has them covered.

    3. Do you see any threats related to big data?

    I think the main threats are related to security and privacy. Like any other information, it can be stolen and used for malicious purposes.

    It can also be gathered without our knowledge and used by various entities in ways which we would find undesirable.

    ReplyDelete
    Replies
    1. Ad. 1 - I agree with you. There is some data though, that lose value with time.

      Delete
  13. 1. We gather more and more data, but does it provide in your opinion much more useful information?

    We gather more and more data and I think that it generates more and more confusion. Human brain is not cappable of such high-dimentional analysis of correlating events. I think that introducing BigData (whatever is behind that label) to both corporations and MISP (small and medium companies) may be beneficial only if one knows what to do with and how to perform effective analysis.

    I think that we lack good easy to use BigData tools that can be simply used by both technical and not technical employees, that is elastic on second hand to fit business needs.

    2. What IT challenges related to big data can you think of?

    I think main challanges are related to UX. BigData is nice to have but both employees and employers need to feel the benefits and aid it brings to the company and your position at work.

    In terms of performance - data becomes more and more complex but the performance (and it's price, availability) of the cloud based systems becomes even more pleasing. Therefore performance > challanges.

    3. Do you see any threats related to big data?

    I think that having BigData reports and analysis brings a lot of positive information to "how the business performs". On the other hand, having that information brings some threat of losing it to hands of the competitor, that could simply use that information for their benefits.

    Other threat? Optimization and reduction of unnecessary positions (people loosing jobs) at workplaces, but it's more of a benefit that comes from conducting your business the proper way.

    ReplyDelete
    Replies
    1. Ad. 3 - And that might be another field for business - data repositories, etc.

      Delete
  14. 1. We gather more and more data, but does it provide in your opinion much more useful information?

    A:/> For Google, Facebook.... that means money. Knowledge is money though. At it's core roots, it doesn't matter the shape, mostly matters how personal is that. Yet some knowledge isn't personal, the story behind it makes it valuable (science, stock markets etc.)

    IMHO, the information makes more sense when the data/knowledge has some value. Indeed it's useful. As long as there are right tools to cherry pick whatever we were looking for, don't see any reason for otherwise.

    2. What IT challenges related to big data can you think of?

    A:/> Expecting old school databases to handle the TBs/PBs of data. Unfortunately not much IT people around with Hadoop / Clustering experience (or something similar)

    3. Do you see any threats related to big data?

    A:/> Hopefully all of us. I'm being sick and tried whenever an app, website etc. tries to collect some of my personal information. I don't know if it's only me or many others but whenever I hear the term 'big data', it feels like someone is trying to get into my personal life. Someone should change the name though.

    ReplyDelete
    Replies
    1. Ad. 3 - Most probably this term will be evolving. Who remembers the whole buzz with Web 2.0?

      Delete
  15. 1. We gather more and more data, but does it provide in your opinion much more useful information?

    Every information is useful for a particular cause. So taking this under consideration the addition of data that forms information has a positive aspect. Although the problem is often that we can't see the information that is provided through the data. We usually know what information we are searching for which makes it easier for us to chose what data concerns it. Our gathering capabilities are limited and this is the main problem right now why we see gathering huge amounts of data not effective.

    2. What IT challenges related to big data can you think of?

    Big data is not really my domain but according to this article: http://www.sas.com/resources/asset/five-big-data-challenges-article.pdf , the main challenges that are related mostly to visualization of data are:
    1. Meeting the need for speed
    2. Understanding the data (I also pointed this out in my answer to the previous question)
    3. Addressing data quality
    4. Displaying meaningful results
    5. Dealing with outliers

    3. Do you see any threats related to big data?

    Probably resource related. To provide storage options that can handle it we will have to use a lot of energy resources.

    ReplyDelete
  16. This comment has been removed by the author.

    ReplyDelete
  17. 1. We gather more and more data, but does it provide in your opinion much more useful information?

    I agree with Michail that every information is useful for a particular cause.
    Large companies spend big money on systems for big data analysis. Such systems allow them to optimize processes in the company as well as in the case of companies selling various products to increase sales. Also many
    large chain stores such as Real use data analysis to
    placing products for instance is that washing powder lies the near liquid detergent is not a coincidence:) as well as the bread is always at the end of the store is also no coincidence because going through the whole store we always buy something additional:)


    2. What IT challenges related to big data can you think of?


    I think that the biggest problem with big data is associated with the limitations associated with the computing power.
    For the analysis to be useful, they should be provided as soon as it possible unfortunately it is connected with large expenditures associated with the purchase computers with high computing power. Another problem is the quality of the data point is to a large amount of data capture and find the most important links between these data

    3. Do you see any threats related to big data?

    It depends on what data is collected and how it used. One of the risks may be that, that companies find out a lot of about and thanks to that they can manipulate us :)

    ReplyDelete
    Replies
    1. Ad. 2. Hopefully, the need for computing power will be partially mitigated by the development of new algorithms.

      Delete