Sunday, 2 December 2018

Week 4 [3-9.12.18] Saving the digital world

Hi everyone !!!

I have found a very interesting ariticle about saving digitlal data for future population. Today many articles, photos, posts are published in the internet. The technology is changing and we have no guarantee that this data won’t disappiered somedays or we couldn’t open it witch new programs. 

Read this article https://www.nature.com/articles/d41586-018-07505-8 and answer some questions:

  1. What is the biggest problem with archiving digital data?
  2. How to protect data from being manipulated before archiving it?
  3. How can we help this people to build a library of our history?
  4. What data should we archiving?


37 comments:

  1. Another difficult problem, which will be decided for us by some beaurocrat.


    1. What is the biggest problem with archiving digital data?

    There are at least a few big ones - hardware and cost. With books we know how many will fit into a building. With a digital archive - the capacities of tomorrow will be much bigger. There is too much of data to archive (although, watch this: https://www.youtube.com/watch?v=tabVaoeNtdk - there are many examples of datacenters which can hold the World's "important" information output easily).

    There are problems

    2. How to protect data from being manipulated before archiving it?

    This is where blockchain comes in? There are tools for data consistency, but we all need to start using them for once. One way is also allowing some 'freedom of speech / opinion' and archiving all versions of 'the story', both the 'official consensus' and some alternative accounts about controviersial topics.

    3. How can we help this people to build a library of our history?

    Licensing our works, contributing to Public Domain. We could have a family archive, which - after a few generations - could be contributed or published as historical sources (diaries of people from past ages often carry very important information). The key is to not rely on 'big companies', because they can go bankrupt and take our data with them. They can help, but everybody needs to have a secure, maybe redundant and somewhat distributed, personal archive.

    4. What data should we be archiving?

    History, culture (including entertainment), algorithms, source code, discussions on forums. Even things like this conversation, although it probably won't be very valuable to many future generations.

    ReplyDelete
    Replies
    1. Thank you for your response and for link. I agree with you that that hardware and cost are big problems. I think that license of commercial product is the another big problem.

      Delete
  2. 1. What is the biggest problem with archiving digital data?
    The problem is rapidly growing and changing technology, which requires a lot of work and effort to make the materials archived and available in the next years.

    2. How to protect data from being manipulated before archiving it?
    Digital materials have the potential to remain fluid over time, being edited or altered with ease, being damaged by media failure, or decoded into human readable information in an unreliable or inaccurate manner by rendering software. For an end user to have trust in the result of digital preservation work it requires careful consideration of the entire lifecycle of the digital materials and who or what has interacted with them over time. Information management systems need to be able to link to essential contextual information regarding the business procedures of the creating agency. Authenticity and integrity of digital resources can be equally important in other sectors. For example, scholars will need to feel confident that references they cite will stay the same over time, courts of law will need to be assured that material can withstand legal evidential requirements, government departments may well have legally enforceable requirements regarding authenticity, and so on. This issue overlaps with both legal and organisational issues and it may be one which is best resolved within individual sectors rather than through generic procedures.
    The application of data integrity techniques and the maintenance of audit trails can provide confidence that a digital object has remained unchanged (except by necessary preservation action) since deposit in an archive. Ultimately it’s authenticity to a user may depend much more on the broader trustworthiness of the preserving organisation as a whole. Maintaining high quality preservation processes based on current best practice and validated by appropriate audit and certification will be crucial (by audit or certification).

    3. How can we help this people to build a library of our history?
    Maintaining a systematic process for bit preservation remains a fundamental requirement in ensuring long term digital preservation. Storage media must be monitored and refreshed. Redundancy must be introduced by replicating or backing up files, introducing diversity in dependent technologies and avoiding catastrophic disaster at a single geographical location. Checksums must be generated and frequently recalculated to identify any loss and ensure that the integrity of the bits can be verified in an efficient and automated manner. The locations in which digital materials are stored should be carefully recorded, and responsibility for their preservation allocated.

    4. What data should we archiving?
    Those whose significance is very high, the documentation of achievements, new inventions or materials of doctoral research, especially in medicine or other important industries. Also those that have cultural and historical significance.

    ReplyDelete
  3. 1. What is the biggest problem with archiving digital data?

    The biggest problem with digital data archiving is their quantity! Yes, cameras, the ability to publish any material on the internet, more and more disks. It all causes that the amount of digital data grows geometrically or more. Data transfer of mobile phones in recent years has generated an increase in the amount of data processed fourteen times.

    2. How to protect data from being manipulated before archiving it?

    We can write long, many and passionately about the methods of protecting digital data. Packaging, MD5, digital watermark and many other methods are used to secure data. However, in my opinion, only large data warehouses can guarantee their safety. One of such places is eu-Lisa. Warehousing of EU law enforcement services in Estonia. I think the data stored there are really safe.

    3. How can we help this people to build a library of our history?

    These aren't simple things. History isn't a simple thing. For descendants, we usually want to leave only the good things in a given decade. We Poles will not gladly want to leave information about, for example, the National Military Organization (NOW) and its activities after 1945.
    But the choice of data to keep is one thing and the behavior of electronic data is completely different. At this point, I write as above, at the moment I believe that only large data warehouses can guarantee the creation of valuable libraries.

    4. What data should we archiving?

    All, because until the end at the moment we do not know what information will be useful for us in the future.

    ReplyDelete
  4. 1. What is the biggest problem with archiving digital data?
    As far as I’m concerned, I see 2 the most important problems with archiving digital data: costs and quick technology change. Archiving needs a lot of digital storage space. Nowadays it costs a lot so it is expensive to archive enormous sets of data. Secondly, even if we manage to archive all data, nobody guarantees that it will be possible to restore it in the future (such as today's 3,5” floppy disks).

    2. How to protect data from being manipulated before archiving it?
    I agree with Tomasz that blockchain may be useful in this situation. There are, after all, algorithms that verify data integrity, such as checksums. Of course they are simple but very useful. Of course, this applies to the maintenance of data already archived. What to check data before archiving? I have no idea - maybe some algorithm validating data...

    3. How can we help this people to build a library of our history?
    I think it is in the interest of all of us because we create history. It seems to me that we should cooperate first and foremost. In my point of view the most important is truth, openness and prudence. We are talking about our personal data, too. That’s why we are entering a psychological zone, not just a technical one.

    4. What data should we archiving?
    Of course, only those that shape our lives - in this way history is born. Unfortunately, it depends on the person and his life, so the matter seems to be very complex.

    ReplyDelete
    Replies
    1. Thank you for your response. The integrality of data is importent and I agree with you that mayby some algotithms could help. We need to generated checksums for our data. Today many people dont't do it. I think that commercial license is the biggest problem to archive data. We need to change law to be possible to save it for next generations.

      Delete
  5. What is the biggest problem with archiving digital data?
    I think the space to store and the access. We need a huge data banks that are safe, safe in a meaning of unauthorized access and phisical demages. Teh second problem is that we need to be able to access and read this data so the software has to compatible with the data.

    How to protect data from being manipulated before archiving it?
    I think that good idea is to store the data in many places and a few of them has no connectin from the outside so the files may be compared. We could also use some hashkeys to now that the data is in its original form.

    How can we help this people to build a library of our history?
    I think that we may donate their idea. Also we can share their idea with other so they can gather the attention and find more sponsor to keep their idea alive. That should be a long distance time project.


    What data should we archiving?
    I think the most important dat ais the one thet is related to culture. I mean that music, movies, books bu also news, blogs, etc. I think that everything that is interesting and not private.

    ReplyDelete
    Replies
    1. Thank you for your response. You are right that the software compatibility could be a problem.

      Delete
  6. 1. What is the biggest problem with archiving digital data?

    Given the rapid development of technology - problems are changing rapidly, as one technology helps to solve a problem, but behaves to additional problems.
    If the question of saving data on the user's physical media - then it is a question of saving them for many years.
    Or as an alternative to storing digital data on cloud servers.

    2. How to protect data from being manipulated before archiving it?

    Definitely use blockchain. Unless of course the question is what today's technologies call today.

    3. How can we help this people to build a library of our history?
    After reading all the answers, I agree with the answer -Cezary Góralski. I think his answer is redundant, and unfortunately I can’t add anything new. And he learned a lot from his answer =)

    4. What data should we be archiving?
    I think this is the data that are very important. Files of scientific, cultural heritage.Perhaps some scientific works, works of culture in the form of - films, music, photos, pictures.

    ReplyDelete
    Replies
    1. Thank you for your response. I thinking about blockchain. Many people write it in answers of my questions. I agree it could be a very good solution.

      Delete
  7. 1. What is the biggest problem with archiving digital data?
    I think that the biggest problem is the increasing amount of data produced. This requires more and more effort to store them, not to mention their processing. From year to year this data will continue to grow and we face the challenge of how to store it as cost-effectively as possible.

    2. How to protect data from being manipulated before archiving it?
    This is a very difficult issue, with such amounts of data. Surely we have to use some security measures that will allow us to check if this was what we wrote down, we can use some watermarks or hashing functions for it. However, there is a problem with the authenticity of what we write down.

    3&4. How can we help this people to build a library of our history? What data should we archiving?
    I allowed myself to combine these two questions. I think it is important what we actually want to preserve. The Internet is full of digital rubbish, you would somehow have to choose what is most important and use it to create a digital library of our history. Unless we recognise that this rubbish is also part of the history of our humanity. Then we have to write down practically everything that appears on the Internet. I don't know how we can help in this process, but first of all we should think about what we produce in it ourselves.

    ReplyDelete
    Replies
    1. Thank you for your response. The rapidly growing data is a problem. I think that another one is a commerical license. Do you agree with me? I agree with you that we need to choose with data we should archive.

      Delete
    2. I agree, obtaining appropriate licenses to make legal access to data is also not easy.

      Delete
  8. 1. What is the biggest problem with archiving digital data?
    The biggest problem with archiving digital data is its quantity and also the speed of technology change. Of course we might archive materials, but will we be able to retrieve them in the next twenty, thirty years?

    2. How to protect data from being manipulated before archiving it?
    It's a very complex and difficult problem, but I agree with Cezary that the appropriate data integrity techniques and maintenance of audit trail could be a good solution. In the end, it's all about providing right control tools and to put in charge of the data in question a public, trustworthy organisation.


    3. How can we help this people to build a library of our history?

    We should cooperate, learn from mistakes of the past and develop technological tools that will enable us to create an appropriate, reliable and trustworthy storage place.

    4. What data should we archiving?

    This is a very difficult question. Something that for one person is worth archiving and is of major importance to him can be useless and worthless from another person's perspective. It's the matter of point of view. I guess we should preserve things that are real life changers, but then again someone has to classify data as such.

    ReplyDelete
    Replies
    1. Thank you for your response. I think that we need save programs code that allow us to open archived data in the future. The commercial license could be the biggest problem.

      Delete
  9. For me the biggest problem is who would own such data which is also related to physical storage capacities. If we consider that some government agency should be responsible for storing digital data then tax payers would have to cover costs of enormous data warehouses and probably new ministry dedicated for this. It could also rise invigilation concerns. On the other hand, if we let some company to own the data we wouldn't have even illusion of control over it. For example, such company could easily censor or modify content of data to serve its needs.

    I'm not sure what should be archived, but probably not everything. For example archiving all the tweets seems unnecessary.

    ReplyDelete
  10. What is the biggest problem with archiving digital data?
    Volume of the data. Speed of data volume incrementing, and accessibility


    How to protect data from being manipulated before archiving it?
    DRM, data signing, data ciphering, data deduplication.


    How can we help this people to build a library of our history?
    Data labeling, writing metadata. Working on classification, clustering and labeling of date to be stored


    What data should we archiving?
    It is hard to determine which data are relevant and which are not.

    ReplyDelete
    Replies
    1. Thank you for your response. You are right that is not easy to determine which data should we archive. I think we should archive data that are important to our history that can help people in the futere learn about us and our style of live. Are you agree with me?

      Delete
  11. 1. What is the biggest problem with archiving digital data?
    The biggest problem is that our data storage technology is changing through years, and this forced data archivists to re-save our cultural heritage repeatedly. Additional threat is that some data could be lost during this process.

    2. How to protect data from being manipulated before archiving it?
    In my understanding the only kind of data manipulation that are done before saving are, noise reduction, or quality improvement, and I don’t think this is wrong, in contrary In my opinion this is valuable change that will allowed future generation to use and understand this data.

    3. How can we help this people to build a library of our history?
    I think that we can start with making people more aware of data archiving, encouraging them to create their own home archive with memoirs worth saving. This should bring question what data we should prevent, and I think that it will change people perception of our global data and what we will what we will leave behind.

    4. What data should we archiving?
    We should not be attempting to clean out our history. We should preserve all of our past, all of the good and the bad. Our history creates us as we are nowadays, and it’s important to pass this lesson to the next generations.

    ReplyDelete
    Replies
    1. Thank you for your response. I agree with you in point 1. This could be a problem. In point 2 I think your point of view is interesting and I agree with it. This changes are important to help read data in future but in my opinion the biggest problem are commercial licenses. What do you think?

      Delete
  12. What is the biggest problem with archiving digital data?

    I think the biggest problem long-life storage. Currently used magnetic disc are not designed to live more than 10 years and a lot of other storage types need electricity to maintane data and could have a problem with survive any natural disaster.

    How to protect data from being manipulated before archiving it?

    I think here the best solution is pluralization. When we will have a lot of different institution and people who will store the data then 'all version of truth' will be saved and some historians in next ages will try to find out, what is the meaning of this data and which one is the right one.

    How can we help this people to build a library of our history?

    I think here the most important thing is to give people place, where they can put their data and some institution will ensure that this storage will be pernament and neither natural disaster nor time will destroy their library.

    What data should we archiving?

    Of course there are a lot of important historical events and we need to ensure, that all information about this events will be preserved, but also all information about what we live, what we are doing, what we produce ande use should be important for future generations, to understand today reality.

    ReplyDelete
    Replies
    1. Thank you for your response. I don't agree with you with pluralization. It will be difficult to choose the correct data.

      Delete
  13. 1. What is the biggest problem with archiving digital data?
    According to article, biggest problems are artificially created: laws governing copyrights and other "intellectual properties". Other problem is storing data on closed platforms, which means the control over data can be lost as a result of decisions we can't appeal.

    2. How to protect data from being manipulated before archiving it?
    The means are already there: we sign data, and propagate our public keys. The problem is we'd have to know what will go into archives beforehand.

    3. How can we help this people to build a library of our history?
    I think if we won't interfere, it'll be enough. Open formats, open licencing, keeping meta-data up to date, no DRMs. They'll take care of the rest.

    4. What data should we be archiving?
    Everything we care of (backups :-) ). And seriously, I think we can't possibly guess what is worth of keeping as we don't know what would be interesting for future generations. We should err on the side of keeping rather than throwing away.

    ReplyDelete
    Replies
    1. Thank you for your reponse.
      In point 1 I agree with you. The license and closed platforms are the biggest problems to archive data.
      In point 2 you are right that we need to know what kind of data we should archive.
      In point 3 I agree with your point of view.
      In point 4 I think there are certain types of data that are important for every generation.

      Delete
  14. 1. What is the biggest problem with archiving digital data?

    • Problems related to the analysis of real data - incorrect data
    • Generating association rules
    • Problems related to the analysis of real data - numerous attributes
    • Problems with the analysis of real data - incomplete data

    2. How to protect data from being manipulated before archiving it?
    The need to encrypt sensitive data in applications results not only from good engineering practices, but often from legal regulations (personal data protection) or industry requirements (PCI-DSS). The solution discussed in this article is data encryption at the level of the database engine, or more precisely data encryption before storing them on a data carrier. It is transparent encryption for an application which, unlike record encryption, does not have to manage encryption keys on its own.

    3. How can we help this people to build a library of our history?
    Russian scientists suggest that cultural heritage sites should be stored on the Moon.
    The accumulated treasures of the material culture of mankind are constantly exposed to destruction by fire, flood, lack of funds or mechanical damage. Unique libraries, architectural ensembles, whole museums are still being destroyed by wars and natural disasters. The Earth is not the safest place for the elements that civilized humanity wants to save for posterity in their original form. The moon is a much more suitable place to use as a museum and a huge safety zone.

    ReplyDelete
    Replies
    1. Thank you for your response. It's a interesting idea to store data on the Moon. I don't think encryption is a good idea because when something go wrong we won't be able to recover data.

      Delete
  15. 1. I think the biggest problem lies in the fact that every year there is a huge amount of data to be archived and we must look for a large amount of space for it, whether in the cloud or on private portable disks. The storage and archiving of huge amounts of data is certainly associated with high costs. We should properly select the data that we would like to archive.

    2. We should use coding programs, watermarks, etc. However, we are not sure that someone will not break into the "data library" and will not manipulate the information contained there. In addition, there are many conflicting data on one topic, should we archive all of them or just selected ones? How to sift data already manipulated, which we want to protect and archive?

    3. We should allocate some space for storing the "library" of our data. Of course, we should only choose reliable, true and meaningful data for further stories of people and future generations. Each of us can also create a small library of data and share it with others, so that the memory of each of us will also survive in the virtual world.

    4. As I mentioned, archived data should be reliable, true and valid. We should archive such data as, for example, important cultural and sorted events, important achievements and breakthroughs, films, music. It seems to me that archiving private things like what I ate for breakfast on July 2, 2017 is meaningless. Referring to the creation of private libraries of each of us, such simple moments as the birth of a child or the first job would be more, but it depends on each of us. In such a global data library should be only information important from the point of view of a given region, country or city.

    ReplyDelete
    Replies
    1. Thank you for your response. The manipulated data are problem. The author of this data should verify it. We should learn to generate a checksum of our data to help people to verify it in future.

      Delete
  16. 1. What is the biggest problem with archiving digital data?

    I think that the archiving digital data is the problem which the world have to look at. The data should be safety stored. The appropriate archiving of digital data will allow to use the data again. The IT technology is being developed all the time, however, what is popular today, does not have to be popular tomorrow. The same situation is with data storage: the ways of the archiving data and the format of data are changing as well. The problem worth to consider is the amount of data you want to archive. I am not convinced that there are enough space to store data, even if the data are archived.

    2. How to protect data from being manipulated before archiving it?

    The appropriate storage of the data allows to minimize the possibility of the data being manipulated. The unauthorized people should have no access to the data. Training the people who have access to the data is also very important. Scanning the data carriers for bots, viruses and etc is necessary to protect data. Checking the control bits, applying checksum and using special algorithms to detect the change of the data seems to be also a good idea. Knowledge of the way the data are stored also lowers the probability for data being manipulated. If we have more information about data, we can prevent manipulating them.

    3.How can we help this people to build a library of our history?
    Data collection may be very long term process. I think that the answer for this question is connected with answer for question number two. As I wrote there, scanning the data carriers is a good idea. The space where we store data, can be damaged so we should have a reserve copy of that data in another place. Applying the checksum and control bits allows to control if the data have been not modified. The human factor is also important in the data collection and storage process so it seems to be reasonable to train people who have access to the data. We should be convinced that the place or institution storing our data has certifications of safety.

    4.What data should we be archiving?

    The answer for this question depends on the point of view - everything may be useful. In my opinion, we should archive the data which include valuable information, i.e. science achievements or history. I think that we should archive the data which were used to obtain the scientific discovery as well for example the data which used to test the algorithms – maybe somebody would want to improve the algorithm and reuse the data. The historical data can be used for future generation.

    ReplyDelete
  17. Thank you for your extensive response. I agree with you. Controling access to sored data is important. There are many kind of data that should be archive. I think the biggest problem is license. I agree that capacity of storage is a problem but whichout regulations in law we couldn't save anything that is protected by companies.

    ReplyDelete
  18. 1. What is the biggest problem with archiving digital data?

    As the article shows, the biggest problem is the durability of data storage, copyright and the law. First of all, there are no legal solutions. How to store data? What data? Who should have data rights?

    2. How to protect data from being manipulated before archiving it?

    One of the methods of data protection is encryption. Another can be the use of a certified digital signature or using checksum. Of course, you should choose the right method for the type of data and the way of archiving.

    3. How can we help this people to build a library of our history?

    We can provide data in digital form: scanned books, audio-books, or our private materials. The longest process is collecting such a large of data.

    4. What data should we archiving?

    I think that, the first is historical data. Later, information of cultural significance. However, the data should be properly classified.

    ReplyDelete
  19. The digitalisation of our resources is a very emerging subject that fortunately is being covered by special institutions. In Poland, for example, we have Polona, which is a Polish digital library, which provides digitised books, magazines, graphics, maps, music, fliers and manuscripts from collections of the National Library of Poland and co-operating institutions. It began its operation in 2006, and it has over two million items digitalised (with a daily score: 2000 objects). Old books could not be digitalised automatically because of the fragility of old paper. However, they are cool automatic robots that could speed up this task (take a look at https://www.youtube.com/watch?v=03ccxwNssmo)

    ReplyDelete
  20. 1.What is the biggest problem with archiving digital data?
    The cost which is necessary to achieve all this data. For common books you need only space to hold them, but digital data need infrastructure to ensure availability for many users at once.

    2.How to protect data from being manipulated before archiving it?
    Blockchain technology resolve this problem.

    3.How can we help this people to build a library of our history?
    We could send to that library ebooks, programs and other sources which could be achieved there. If everyone send one position the library will have huge number of position in short time.

    4.What data should we archiving?
    That is hard to answer. It depend. If we have enough space to achieve every source why we shouldn't save them all? Of course we can't save internet, but the most important digital documents or documentations about nowadays IT solutions should be achieved.

    ReplyDelete
  21. 1. What is the biggest problem with archiving digital data?
    According to the article and my personal opinion, the biggest problem is the amount of digital data, the way it should be storage and the legal regulations relating to copyrights and intelectual property law. Firstly, in the Internet there are countless data, private, public, belonging to the individuals or to the big companies… and to ensure that all of them will be archived is nearly impossible. Secondly, storage is also problematic – the discs and drives containing such data may be physically or chemicaly destroyed, and some ways of saving such data may not be possible to open in the new software. Lastly, there is a case with the law – regulations all over the world differ from each other, and it would be very difficult sometimes to establish, which legal system is adequate in relation to some data.
    2. How to protect data from being manipulated before archiving it?
    The article mentioned one way – archivists create a digital ‘finger-print’ to the file, know as hash. It is the string of letters and numbers which is individual for such file and every change in it cause that for the next save the hash will not be right. It can verify if some copy is the same as original file.
    3. How can we help this people to build a library of our history?
    I think the main way to help digital librarians is to give them our permition to storage data which we publish on the Internet. What’s more, we can by ourselves try to storage some of data produced by us during a lifetime and save them on the disks or pendrives, and then in our will give it to companies and offices which are taking care of archiving digital data.
    4. What data should we archiving?
    I think that the most important is to archive data crucial for the science, especially for medicine or chemistry. It is very important to prevent the development of science to take step back in the future because of the material’s lost. Next we should focus on archiving the cultural achievement – music, books, movies etc. We must also take care of archiving remains of typical style of life, so the future generation have the wide variety of materials to recreate our culture.

    ReplyDelete
  22. There are several problems, some of them have been mentioned in the article. We can divide them into social, technical, legal and financial challenges. We often discuss the last issue, concentrating on the cost of data archiving, i.e. hardware aspects. It touches also technical problems, concerning exact solutions, but space needed for this purpose as well. But still, this is not the most important thing, because we cannot forget the social and legal aspects. As correct indicated in the article, many data are in private corporations’ hands, and their amount is growing every day and every hour. Controlling them is a challenge of itself, together with intellectual properties’ aspects.
    As for protecting data before archiving, I can see some simple methods, normally used for this purpose:
    doing back-ups often, using file-level and share-level security, documents’ password-protecting, using EFS and disc encryption, making use of a public key infrastructure, hiding data with steganography, protecting data in transit with IP security, securing wireless transmissions, using rights managements to retain control.
    Answering the question about helping people to build the history, I’d like to indicate that we do it in fact, if we want that or no, we are unconsciousness while doing this. Every day we add new data to the total sum of the information gained so far. The future result depends on material content being gathered online – and it will be consequence of social, psychological, economical and technical aspects.
    What we should archive, is what is important for us. This is the only reason to do it. After many years, it will be clear, what was significant for the population…

    ReplyDelete