Wednesday, 21 November 2018

Week 3 [19-25.11.18] Distributed Systems

Hello everyone,
I would like to present an article about "A Thorough Introduction to Distributed Systems".
The author explains what is distributed computing, scaling database. He also gives what they are
Distributed System Categories. 
After reading the article please answer the questions:
  1. What is a distributed computing?
  2. What that has to do scaling database with distributed computing?
  3. How can distributed computing develop science? Give an example.
  4. Have you participated in or continue to participate in a project using distributed systems? 
    If so, please tell about it.
    
    
    Distributed Systems
     
 

32 comments:

  1. 1. What is a distributed computing?

    Distributed computing is a field of computer science that studies distributed systems. A distributed system is a system whose components are located on different networked computers, which then communicate and coordinate their actions by passing messages to one another.

    2. What that has to do scaling database with distributed computing?

    Scaling out often involves sharing the database across multiple database servers in a distributed cluster while scaling up involves increasing the computing power and resources of the database server.

    3. How can distributed computing develop science? Give an example.

    In many many ways, there is a wikipedia page showing multiple examples:
    https://en.wikipedia.org/wiki/List_of_distributed_computing_projects

    Among them you can find: project that analyse ways to improve climate prediction models, Research in Cardiac Electrophysiology or Searches for the most efficient method of hydrogen production.

    4. Have you participated in or continue to participate in a project using distributed systems?

    No, I have never participated in a project using distributed systems

    ReplyDelete
    Replies
    1. Thank you for your answer. Big databases use distributed systems to act faster and protect against failures. The list distributed computing projects is very useful for me.

      Delete
  2. 1. What is a distributed computing?
    A distributed computing is a technique of splitting an enormous task into many smaller tasks, each of which can be done by a single computer. That huge task is divided into many smaller ones and implemented in parallel.

    2. What that has to do scaling database with distributed computing?
    As far as I’m concerned, these issues can be connected with each other and allow to process the Big Data in a quick way. Scaling database allows also to handle a significant increase in queries or the data itself without losing the performance.

    3. How can distributed computing develop science? Give an example.
    As Cezary wrote, there is a lot of such projects. I have heard only about few of them, for example earthquake prediction or weather estimation.

    4. Have you participated in or continue to participate in a project using distributed systems? If so, please tell about it.
    Unfortunately, I haven’t participated in such projects.

    ReplyDelete
    Replies
    1. Thank you for your answer. The large databases must handle multiple queries at the same time. Without dividing them, a large number of nodes would be impossible. I encourage you to participate in the selected project.
      https://boinc.berkeley.edu/

      Delete
  3. This comment has been removed by the author.

    ReplyDelete
  4. 2. Scaling database with distributed computing is the cheapest way to boost performance and allow to process a lot more concurrent queries. Moreover it gives you fault tolerant and sometimes also geographically distributed data.

    3. Of course there is a lot of project like this. I belive the first I saw and participated was SETI@home. This was a project based on volonteer computing to search for signs of extraterrestrial intelligence from radio signals.

    4. Yes, I participated and still participating in multiple applications of distributed systems. I beliave the biggest one was Smart City Dubai, where we build hadoop cluster to store and compute different tipe of city data, like public transportation, energy consumption and so on.

    ReplyDelete
    Replies
    1. Thank you for your answer. Scaling database is very important for location (the signal goes the shortest way). I am very interested in the project Smart City Dubai. Can you tell me more about this? How many data is produce for time?

      Delete
  5. 1. What is a distributed computing?

    Distributed computing is a field of computer science that studies distributed systems. A distributed system is a group of computers working together as to appear as a single computer to the end-user. These machines have a shared state, operate concurrently and can fail independently without affecting the whole system’s uptime.

    2. What that has to do scaling database with distributed computing?

    Scaling database helps to improve availability and performance when demand is changing, especially when changes are unpredictable. We can increase the computer resources or add another computer with a database slave to improve access to our database.

    3. How can distributed computing develop science?

    Distributed computing can help reserchers to get results much faster. There are many projects on the internet that used distributed computing to resolve many compliacted problems. Cezar mentioned it in his response to this article on this blog.

    4. Have you participated in or continue to participate in a project using distributed systems?

    In our research we tried to use a distributed computing by using matlab toolbox designed for this purpose.

    ReplyDelete
    Replies
    1. Thank you for your answer. Big databases have a many of queries at any time and must handle them. Please, tell me more about matlab toolbox. How design this toolbox in matlab?

      Delete
  6. 1. What is a distributed computing?
    Distributed computing is a model in which components of a software system are shared among multiple computers to improve efficiency and performance. According to the narrowest of definitions, distributed computing is limited to programs with components shared among computers within a limited geographic area. Broader definitions include shared tasks as well as program components.

    2. What that has to do scaling database with distributed computing?
    Technologies like Hadoop and NoSQL fit into modern distributed architectures in a way that solves scalability and performance problems.

    3. How can distributed computing develop science? Give an example.
    Distributed systems such as Hadoop or spark could be used for data analysis

    4. Have you participated in or continue to participate in a project using distributed systems? If so, please tell about it.
    Unfortunately I have never been part of a team that was using distributed systems

    ReplyDelete
    Replies
    1. Thanks for your answer. I like the comparison of distributed systems to the model. Many factors affect distributed computing, for example: location, number of computers, size and dispersion of the database, complexity of the problem, algorithms used, etc. I encourage you to participate in the selected project.

      Delete
  7. 1. What is a distributed computing?

    The definition is ... a field of computer science that studies distributed systems. A distributed system is a system whose components are located on different networked computers, which then communicate and coordinate their actions by passing messages to one another.[1] The components interact with one another in order to achieve a common goal. Three significant characteristics[why?] of distributed systems are: concurrency of components, lack of a global clock, and independent failure of components. Examples of distributed systems vary from SOA-based systems to massively multiplayer online games to peer-to-peer applications.

    2. What that has to do scaling database with distributed computing?

    Yea ... Scalability (scalin) is the capability of a system, network, or process to handle a growing amount of work, or its potential to be enlarged to accommodate that growth.[1] For example, a system is considered scalable if it is capable of increasing its total output under an increased load when resources (typically hardware) are added. An analogous meaning is implied when the word is used in an economic context, where a company's scalability implies that the underlying business model offers the potential for economic growth within the company.

    3. How can distributed computing develop science? Give an example.

    I do not know, but I hope that the distraction will allow breaking passwords. Decoding messages on Skype, Messenger or whats'up.

    4. Have you participated in or continue to participate in a project using distributed systems?
    If so, please tell about it.

    Unfortunately, I didn't participate.

    ReplyDelete
    Replies
    1. Thanks for your answer. Distributed systems improve application performance and and increase computing power. If you want to participate in project that decrypts the information I suggest Enigma@Home. Some information about this project you can find on this site:
      http://www.enigmaathome.net/

      Delete
  8. 1. What is a distributed computing?
    Distributed computing is a model in which components of a software system are shared among multiple computers to improve efficiency and performance. It is limited to programs with components or tasks are shared among computers within a limited geographic area. In a simplest words distributed computing just means that something is shared among multiple systems which may also be in different locations. It’s also the key to the influx of Big Data processing.

    2. What that has to do scaling database with distributed computing?
    Scalability refers to the capability of a system to handle a growing amount of work, or its potential to perform more total work in the same elapsed time when processing power is expanded to accommodate growth. While distributed computing make this possible to compute.

    3. How can distributed computing develop science? Give an example.
    There are many project that involves distributed computing for example ATLAS@Home is a research project that uses volunteer computing to run simulations of the ATLAS experiment at CERN.

    4.Have you participated in or continue to participate in a project using distributed systems?
    If so, please tell about it.
    Unfortunately, I haven’t participated in such projects.

    ReplyDelete
    Replies
    1. Thanks for your answer. Distributed computing "[...] It’s also the key to the influx of Big Data processing." I agree with You, because now we have a large of data, which we can't make it. I encourage you to participate in the selected project to help science.

      Delete

  9. 1. What is a distributed computing?

    The simplest definition would be: computing done on multiple interconnected computers.
    Some supercomputers and clusters meet this definition, as do cryptocurrency mining pools.

    2. What that has to do scaling database with distributed computing?

    A central database can be a bottleneck of a distributed system. This can be adressed by using replication, load-balancing, eventual consistency, sharding etc. The actual method must match the task - a storage backend for a HPC physics simulation will be different than cloud storage for a social media site.

    3. How can distributed computing develop science? Give an example.

    GIMPS - the project searching for prime numbers, Folding@Home - simulating of protein formation, and many projects for the general population.
    On the high end - the large supercomputers are mostly clusters, because although Moore's Law has worked more or less even until now (I think we're just pretending that it still applies, this is done by revising down the time constant of this exponential function!!! But what if it is just a sigmoid?!).
    So the only way around the limitations has always been to "go parallel", and not wait another 2 years for the power to double. And Moore's Law only
    allows for the power to double, while going distributed allows for much larger factors - given enough money, the sky is the limit. Just ask Google.

    4. Have you participated in or continue to participate in a project using distributed systems?
    If so, please tell about it.

    Not really. I was supposed to use a cluster with MPI (Message Passing Interface) back in 2010, but I never actually went beyond multicore (OpenMP threading). On the other hand, a Web application could be considered a distributed computing example, especially the current rich ones, which do a lot of processing on the client side.

    ReplyDelete
    Replies
    1. Thanks for your answer. You have rightly noticed that the right method distributed computing should be chosen for the problem. In my opinion the Moore's Law is out of date for five years, because then a lot of applications began to be designed for distributed systems (phone applications). I encourage you to participate in the selected project to help science.

      Delete
  10. 1. What is a distributed computing?
    Distributed computing is a computing concept that, in its most general sense, refers to multiple computer systems working on a single problem. In distributed computing, a single problem is divided into many parts, and each part is solved by different computers. As long as the computers are networked, they can communicate with each other to solve the problem. If done properly, the computers perform like a single entity.

    2. What that has to do scaling database with distributed computing?
    Achieving scalability and elasticity is a huge challenge for relational databases. The enhancements to relational databases also come with big trade-offs as well. For example, when data is distributed across a relational database it is typically based on pre-defined queries in order to maintain performance. In other words, flexibility is sacrificed for performance. Additionally, relational databases are not designed to scale back down—they are highly inelastic. Once data has been distributed and additional space allocated, it is almost impossible to “undistribute” that data.

    3. How can distributed computing develop science? Give an example.
    I found a subpage on wikipedia where there are a lot of interesting projects that use distributed calculations.
    https://en.wikipedia.org/wiki/List_of_distributed_computing_projects

    4. Have you participated in or continue to participate in a project using distributed systems? If so, please tell about it.
    I did not have the opportunity to participate in any project that benefited from distributed computing.

    ReplyDelete
  11. 1. What is a distributed computing?

    A distributed computer system (DCS) is a collection of computers connected by a communications subnet and logically integrated in varying degrees by a distributed operating system and/or distributed database system. Each computer node may be a uniprocessor, or multiprocessor, or a multicomputer. The communication subnet may be a widely geographically dispersed collection of communication processors or a local area network. Typical applications that use distributed computing include e-mail, teleconferencing, electronic funds transfers, multi-media telecommunications, command and control systems, and support for general purpose computing in industrial and academic settings.

    2. What that has to do scaling database with distributed computing?
    To scale databases you need to increase computing power, also increase resources.


    3. How can distributed computing develop science? Give an example.

    I support the answer Cezary Góralski. And all my examples will coincide with examples of subjects on Wikipedia (the link that gave Cezary Góralski)



    4. Have you participated in or continue to participate in a project using distributed systems?
    Unfortunately not, but I'm interested in this topic.

    ReplyDelete
  12. 1. Distributed computing pertains to a computing concept that involves many computer systems working on a single problem. Such a model improves efficiency and performance, when it comes to e.g. large data processing. The computers in distributed system can be physically close together or geographically. The most important thing is that they are run by a single system.
    2. One can make use of both things to process big data in a quicker and more efficient way.
    3. There are many fields of science that can make use of distributed computing. Actually, to be accurate ‒ all branches that involve data analysis.
    4. No, not really.

    ReplyDelete
  13. Hi Artur,
    Thanks for substantive issue.
    As for your questions:

    1. Distributed computing is a field of computer science interested in distributed systems. And distributed system is a collection of independent computers that appears to its users as a single coherent system.
    Instead of going further with definitions, it is perhaps more useful to concentrate on important characteristics of distributed systems. One important characteristic is that differences between the various computers and the ways in which they communicate are mostly hidden from users. The same holds for the internal organization of the distributed system. Another important characteristic is that users and applications can interact with a distributed system in a consistent and uniform way, regardless of where and when interaction takes place.

    2. When a system needs to scale, very different types of problems need to be solved. Let us first consider scaling with respect to size. If more users or resources need to be supported, we are often confronted with the limitations of centralized services, data, and algorithms. For example, many services are centralized in the sense that they are implemented by means of only a single server running on a specific machine in the distributed system. The problem with this scheme is obvious: the server can become a bottleneck as the number of users and applications grows. Even if we have virtually unlimited processing and storage capacity, communication with that server will eventually prohibit further growth.
    When considering scaling techniques, one could argue that size scalability is the least problematic from a technical point of view. In many cases, simply increasing the capacity of a machine will the save the day (at least temporarily and perhaps at significant costs). Geographical scalability is a much tougher problem as Mother Nature is getting in our way. Nevertheless, practice shows that combining distribution, replication, and caching techniques with different forms of consistency will often prove sufficient in many cases. Finally, administrative scalability seems to be the most difficult one, rartly also because we need to solve nontechnical problems (e.g., politics of organizations and human collaboration). Nevertheless, progress has been made in this area, by simply ignoring administrative domains. The introduction and now widespread use of peer-to-peer technology demonstrates what can be achieved if end users simply take over control. However, let it be clear that peer-to-peer technology can at best be only a partial solution to solving administrative scalability. Eventually, it will have to be dealt with.

    3. Distributed systems form a rapidly changing field of computer science. New subjects being discussed are, for example, sensor networks, virtualization, server clusters, and Grid computing. Especially worth paying attention is self-management of distributed systems, an increasingly important topic as systems continue to scale.

    4. I am very sorry, but unfortunately not yet.

    Kind regards, Marta

    ReplyDelete
  14. Ad1.
    Distributed computing in can be defined as a group of computers that are working together at the backend while appearing as one to the end-user. The individual computers working together in such groups operate concurrently and allow the whole system to keep working if one or some of them fail.
    Ad2.
    The scale-out option implies a distributed system whereby additional machines are added to a cluster to provide additional capacity. It's often more likely to yield a linear increase in scalability, although not necessarily increased performance.
    Ad3.
    Here is a full list where you can read about where distributed calculations are used in science.
    https://en.wikipedia.org/wiki/List_of_distributed_computing_projects
    Ad4.
    Sometimes I share the power of computer servers to unleash new planets
    https://news.astronet.pl/index.php/2016/02/22/n7759/
    I am a participant in the project „ Wszechświat w Twoim domu”

    ReplyDelete
  15. Definition of the distributed system from the article is that it's a system connecting multiple machines in such a way that the end user can perform some operations on those machines as if was the one machine.

    For the database case it means that a single data base can be copied or divided into smaller ones and kept on multiple servers. Thanks to this overload caused by too many queries to a single database can be avoided.

    As scaling by adding additional machines/hardware instead of using more powerful hardware in a single machine is cheaper and not limited by the fact that at some point more advanced hardware may not even exists yet, it gives researchers to more computer power allowing them to perform more complex experiments etc.

    ReplyDelete
  16. What is a distributed computing?
    It is a simple system that provides a bigger power computing to the end-user. A system is a group of computers working together as to appear as a single computer to end-user.

    What that has to do scaling database with distributed computing?
    Scaling database supports traffic management. We have two following ways: Scaling horizontally means adding more computers rather than upgrading the hardware of a single one.
    Scaling verically means updating the device when the database is running on.
    How can distributed computing develop science? Give an example.
    There are many examples. I think the good one is the weather forecast. Many computers could provide a lot of data about weather locally for global analysing.

    Have you participated in or continue to participate in a project using distributed systems?
    If so, please tell about it.

    Unfortunately no. Thank you for the article and good luck.

    ReplyDelete
  17. 1. What is a distributed computing?
    The formal definition you can find in wikipedia, but generally it a field of computer science about how we can build software to be able to split processing and take advantege of network and of multiple connected machines to reduce cost and store data more safety.

    ReplyDelete
  18. 1. Distributed computing consists of dividing one task into a series of smaller ones, so that they can be executed in parallel on many other computers which individually would not be able to solve it. After the processing has been spread, it is enough to segregate the data and get the initial solution to our problem. This approach is good because it is enough to increase the number of nodes in parallel processing if we have a bigger task to solve.

    2. Scalability is when the system is able to do more work at the same time. Distributed computations make it possible to speed up the system, but we must remember that part of the program is always performed in a sequential manner, which we are unable to accelerate.

    3. Taking the example based on the article, distributed processing is used in a large number of message sending systems, e.g. Amazon SQS, Kafka, RabbitMQ. I think that in every field of science, you can use distributed computing, especially in scientific calculations to speed up the calculation of huge amounts of data.

    4. Unfortunately, I have never participated in such a project.

    ReplyDelete
  19. What is a distributed computing?
    Distributed processing is a way of solving problems that are divided into many smaller blocks and solved by many computers at the same time. This solution is scalable and very often used to solve problems of high computational complexity. The most difficult is to correctly separate the problem into blocks that are not logically connected. The rest are just concatenation results from all hosts.

    What that has to do scaling database with distributed computing?
    A huge database that will run on one host and will ask for information many times every second, will not run smoothly, and if the data changes really quickly, the results may be inconsistent. Distributed computing allows the database to run on multiple hosts simultaneously (most importantly) allows you to get results much more efficiently.

    How can distributed computing develop science? Give an example.
    I heard about project which looking for prime numbers. I don't know any other example for develop science by distributed computing. I think that more often it is using in comercials projects, not in science, but I could be wrong.

    Have you participated in or continue to participate in a project using distributed systems?
    If so, please tell about it.
    Unfortunately no,but this topic is really interesting.

    ReplyDelete
  20. What is a distributed computing?
    Distributed computing is using many computers that seems to be single computer to the end-point user. MAchines have some shared state. If one computer fail that do not affect the whole system.

    What that has to do scaling database with distributed computing?
    To increase produtivity Master-Slave Replication strategy is used. Two new database servers are created to sync up with the main one. Machines can only read from these new instances.

    How can distributed computing develop science? Give an example.
    Distributed computing may be very useful in science where we have a lot of data and have to computecomplicated equations on them very quickly. It can be used to compute machine learning algorithms especially deep learning quite fast when doing the same using just single machine would take many days or even years.

    Have you participated in or continue to participate in a project using distributed systems?
    If so, please tell about it.
    Unfortunatelly I havent participate in suhc project but I am sure that in a future I will have to.

    ReplyDelete
  21. 1. What is a distributed computing?
    In general, it is performing computations and synchronizing over many distinct nodes, to get the results faster or more reliably.

    2. What that has to do scaling database with distributed computing?
    When running database on distributed system, some operations might need to operate on data aggregated from more than one node, and distributed computing happens then.

    3. How can distributed computing develop science? Give an example.
    All places where huge amount of data needs to be processed, like measurements from complex physical experiments, massive but parallelizable simulations ranging from biology and medicine down to fundamental physics.

    4. Have you participated in or continue to participate in a project using distributed systems? If so, please tell about it.
    I don't think so, distributed web applications does not count I guess.

    ReplyDelete
  22. Ad1.
    Distribiute computing – taking a single problem - also asked by single user - and using a multiply nodes (computers and their resources like ram and cpu/gpu) to make calculations simultaneously much faster then the single computer. Main difference between distribute computing and old seeing of paralell computing is that in distribute we have multiple computers wich can communicate with each other when they make computing, and in old seeing of parallel we have only multiple processors with one memory.

    Ad2.

    You must scaling your database horizontally – just upgrading system by adding new computer, rather than upgrading resources in single one.

    Ad3.

    In our lab we want to use Hadoop for EEG data analysis. We don’t have enough RAM to do it on one computer.
    I also heard about computing for genome’s reasarch with map reduce framework.
    You can read about it here :
    https://genome.cshlp.org/content/early/2010/08/04/gr.107524.110.abstract

    Ad4.

    Similar to Andrzej I also sometimes share the power of my computer for Universe@Home project, but other than that I didn’t participated in a project using distributed systems.

    ReplyDelete
  23. 1. What is a distributed computing?

    Distributed computing is a field of computer science. It is a system whose components are located on different networked computers. The components communicate to each other and coordinate their actions by passing messages to one another. They interact with one another in order to achieve a common goal. The following phrases describe distributed computing: concurrency of components, lack of a global clock, and independent failure of components. The problem is divided into many tasks, each of which is solved by one or more computers, which communicate with each other via message passing.

    2. What that has to do scaling database with distributed computing?

    Scaling increases the computer power and the calculations are faster performed. Its potential to perform more total work in the same elapsed time when processing power is expanded to accommodate growth. A system is scalable if it can increase its workload and throughput when additional resources are added.

    3. How can distributed computing develop science? Give an example.

    There are lot of the distributed computing projects. Here, you can find list of them: https://en.wikipedia.org/wiki/List_of_distributed_computing_projects.
    I have heard about Climate Prediction project: https://en.wikipedia.org/wiki/Climateprediction.net

    4. Have you participated in or continue to participate in a project using distributed systems? If so, please tell about it.

    Unfortunately, I have not participated in a project using distributed systems but I hope I will. It will be a good experience.

    ReplyDelete