Monday, 6 May 2019

Week 4 [06-12.05.2019] Language models too dangerous?

Some time ago OpenAI announced new, better English language model: GPT-2, successor to its previous GPT model. It can take off after human-provided text snippet, continuing with similar style and quite coherently. Other applications are text summarization, or question answering.


Upon announcement, researchers stated that due to concerns of safety they decide to withhold full model and release only smaller model. In recent update, they propose staged, gradual release, and partnership with security/AI communities to prepare societies for full models.

1. Do you think it's valid concern, that people are not ready for deluge of AI-generated fake news and comments in propaganda wars? Or whole withhold is just a public stunt to gain more news coverage?
2. Is it inevitable that sometime in the future AI researchers - just like security researchers now - will be following responsible disclosure practices?
3. Do you think that further development of statistical language models might result in "true" understanding of the meaning?

30 comments:

  1. 1. Do you think it's valid concern, that people are not ready for deluge of AI-generated fake news and comments in propaganda wars? Or whole withhold is just a public stunt to gain more news coverage?

    Definitely not in every language it is even possible, as we all know, they focus on English, but to reach the level of a person who uses a language throughout their lives in addition to slang, sayings and words that are not in dictionaries is extremely difficult and time-consuming.

    Bearing in mind the still relatively low level of knowledge of the average person on the subject of technology and their skills, I think that they may not be ready. But even it is not about being ready only because they do not want such technologies and it will be difficult to overcome such a psychological barrier.

    2. Is it inevitable that sometime in the future AI researchers - just like security researchers now - will be following responsible disclosure practices?

    Already today I see many gaps and abuses, both in new research and in the past, the greater the use of technology, the greater the scope for cheaters who want to take advantage of the opportunity. Unfortunately, this will be a problem in every field, it is very difficult to obtain reliable data and research results, there will always be research sponsors who will carry out their research in such a way that their company is shown in the best light.

    3. Do you think that further development of statistical language models might result in "true" understanding of the meaning?

    I do not know if this method is the best, in addition to the problems I mentioned in the first question, there are still issues such as sarcasm, jokes, jargon, and many more. I can not say which technique should be used.

    ReplyDelete
    Replies
    1. Yes, there is huge difference in amount of learning material available depending on language. The difference gets even larger for material that is manually annotated and clean from legal point of view.

      Delete
  2. 1. Do you think it's valid concern, that people are not ready for deluge of AI-generated fake news and comments in propaganda wars? Or whole withhold is just a public stunt to gain more news coverage?

    Well, people believe not only in artificial junk news from the internet. They also believe in the news served on the radio and television. I'm curious how many people believed in the news from April 1 that they will give us 300 or 500 zlotys, I don't remember, per month for each cat? But only in Wroclaw :-D. Or remember this American movie from the age of 80 about the invasion of aliens? People aren't ready for this information.

    2. Is it inevitable that sometime in the future AI researchers - just like security researchers now - will be following responsible disclosure practices?

    Inevitable! Not only because the scientists don't want to reveal all their discoveries (apparently the water engine is long invented but the oil man is paying for it not to reveal it) but also because we are often not ready to discover. It's enough to analyze Leonardo da Vinci's discoveries, read these ... books written by Verna. I willn't mention the early works of the Brothers Grimm ;-)

    3. Do you think that further development of statistical language models might result in "true" understanding of the meaning?

    Yes, someone will one day lead to the fact that language will understand machines. Intonations, emotions, etc. It will work! I'm sure.

    ReplyDelete
    Replies
    1. What I meant in point 2 is that now, in security research circles, when security flaw is discovered, it is customary to silently notify responsible persons to give them some time for implementing and deploying a fix. After this time ends, embargo is revoked and bug is announced to the public.
      But in this case, who would be notified on new dangerous research results?

      Delete
  3. I'm sure that not publishing the models and saying that they don't do it for public safety was entirely to generate a buzz about it. For example the FakeApp for photorealistic face swapping (https://www.youtube.com/watch?v=ZZMJHErmGSA) is publicly available, they are some tutorials on how to use it and it looks quite simple. Still, the world isn't flooded with any more fake news than news agencies were serving us before.

    Yes, it seems that in AI arises new field focused on moral values and similar aspects that should be taken into account when working on AI solutions: https://distill.pub/2019/safety-needs-social-scientists/
    I don't how much of a momentum it can catch. Personally I'm not interested in such things and I don't have any moral concerns when it comes to AI/Data Science.

    I think that statistical models aren't the way for achieving understanding. We would need totally different paradigm, probably something mimicking processes in human brain if we want the real understanding. Probably we don't even fully understand what is necessary to "really understand" something.

    ReplyDelete
    Replies
    1. Thanks for this piece on distill, it was quite interesting read. It's something deeply American to settle the truth by who "wins a debate" :-)

      Delete
  4. 1. Do you think it's valid concern, that people are not ready for deluge of AI-generated fake news and comments in propaganda wars? Or whole withhold is just a public stunt to gain more news coverage?
    It's already happening. And everywhere there are implemented deep AI algorithms and they give us what we would like to see, data consistent with our profile and what the owners of these companies would like you to see. The second element is the conscious shaping of society. Hypothetical SkyNet cannot be stopped.
    2. Is it inevitable that sometime in the future AI researchers - just like security researchers now - will be following responsible disclosure practices?

    Hiding innovative concepts in drawers is no use because new concepts are based on existing concepts and their discovery is inevitable. If not, someone else will become their "discoverer". Great minds think alike. The collective knowledge of mankind is the result of a collective search for available data. If OpenAI decided to make such a move, it is most likely a happening, a social action aimed at making people aware of the dangers associated with machine-generated fake news.

    3. Do you think that further development of statistical language models might result in "true" understanding of the meaning?
    The basic category used in quantitative statistical calculations is absolute frequency. It is a numerical indicator obtained by summing up the units in a given sample. And I have an impression that the confusions connected with languages are much more advanced.

    ReplyDelete
    Replies
    1. Hello! The concept of inevitable discoveries, and even further - inevitable inventions, is very interesting. How much the route of technical civilization is a product of fundamental constants of the universe, of local geographic conditions, and of pure accident.

      Delete
  5. 1. Do you think it's valid concern, that people are not ready for deluge of AI-generated fake news and comments in propaganda wars? Or whole withhold is just a public stunt to gain more news coverage?

    Certainly in part it is a procedure to get more publicity. However, the topic is very interesting. Currently, work is in progress on detecting fake news and the SI is also used there. I am curious how such systems will cope with fake texts generated by artificial intelligence. If the texts generated by such a mechanism are really very good quality, then not sharing the whole model is a good move. Even without it, the internet is full of false information.

    2. Is it inevitable that sometime in the future AI researchers - just like security researchers now - will be following responsible disclosure practices?

    Of course, I think it is inevitable. Considering how many areas are attempts to use SI and how often it comes to sensitive data.

    3. Do you think that further development of statistical language models might result in "true" understanding of the meaning?

    I think that statistical models will never fully understand the meaning of the text. The language is very flexible and changes relatively quickly, especially on the Internet. I believe that neural network models will sooner be able to do this.

    ReplyDelete
    Replies
    1. Fake news generator coupled with fake news detector? Seems like some distributed global GAN architecture resulting in arms race.

      Delete
  6. 1. Do you think it's valid concern, that people are not ready for deluge of AI-generated fake news and comments in propaganda wars? Or whole withhold is just a public stunt to gain more news coverage?
    I think it's very important because today we can found many fake news on the Internet. The decision to public small amount of data it's a safe way to share results witch others and avoid potential risk that our work will be used in wrong way eg. for generate fake news. I think people are not good prepared to handle with it and this could be very dangerous for all people who search a valid information. It will be difficult to say that this news is fake or not. This solution in wrong hands will be an easy way and low cost method to generate spam.

    2.Is it inevitable that sometime in the future AI researchers - just like security researchers now - will be following responsible disclosure practices?
    I think it's possible because people are good and bad. Someone will used an researcher work to help people and someone else will used it for wrong goal. It will be important to think about consequence of publish our work. Sometimes is better to not share with everyone but with small group of researchers who would improve our work to help people.

    3. Do you think that further development of statistical language models might result in "true" understanding of the meaning?
    I think it’s could be happen. Today we see a big improvements in this area. The result published in the article are very good. I think in the nearest future the AI will be much better.

    ReplyDelete
    Replies
    1. As authors of the model state, it's just bigger and better than previous. So by withholding, they are just buying some time. What to do with this time? Maybe some global public service announcements?

      Delete
  7. 1. Currently, not only bots generate a lot of false information on the Internet. People do it as readily. For their own benefit or recommendation. I think that even politicians know how to do it and it will be effective for a large number of people to believe in their lies. People sometimes consider what they read on the Internet as the only truth, but they do not even look at who wrote this message, so it does not matter if it's an algorithm or another human being.

    2. Of course. The information and technology race is in the best and not everyone wants to reveal their research to a wider group, and if it does, it just repeats the secret. If a large sponsor is behind the research, he may not want to reveal sensitive information that could threaten him.

    3. I think that there is still a long way to understand the "real" meaning, especially because people sometimes do not know what someone else meant. It remains to be deciphered how to understand slang, sarcasm, jargon used in narrow groups of interested, some reformulation of meanings that are present in colloquial language. Maybe someday it will work, if they are just statistical models, I do not know.

    ReplyDelete
    Replies
    1. As for large sponsor, Elon Musk decided to exit OpenAI (which he helped to fund) in February 2019, most probably because of this research :-)

      Delete
  8. Hi! Thank you for an interesting subject to discuss!

    1. I think that it was kind of marketing move to spread the news widely. However, I do believe that this neural net is able to produce text that is correct in lexical, syntax and semantic ways. I do not, personally, see a threat from this side itself. I know that this kind of solution could enhance and speed up the generation of fake news, but they need to be used by a human that would use it just like any other tool.

    2. Sure, I think that there would be three camps, just like in IT security: white / grey / black hats. And every camp would be competing with others' approach.

    3. You mean in the way of "strong AI"? Personally, I think- not, because there should be another breakthrough in AI design to achieve that level of natural language understanding.

    ReplyDelete
    Replies
    1. 1 - I've just remembered this thing with Youtube channels serving algorithmic rubbish content to children. Now with such model - that would be really evil!
      3 - Yes, precisely - I mean it in strong AI context. I wonder if - given enough data - statistical models would converge to SI-like.

      Delete
  9. 1. Do you think it's valid concern, that people are not ready for deluge of AI-generated fake news and comments in propaganda wars? Or whole withhold is just a public stunt to gain more news coverage?

    People believe in everything they hear, read and see regardless of the fact if it's on the Internet, televion or radio. So in my humble opinion, they are definitely not ready for such pieces of information.

    2. Is it inevitable that sometime in the future AI researchers - just like security researchers now - will be following responsible disclosure practices?

    The problem of obtaining reliable data and research results is a sad fact that is sign and signature of modern times. Responsible disclosure procedure are really needed, but whether they wil really be followed is a totally different kettle of fish.

    3. Do you think that further development of statistical language models might result in "true" understanding of the meaning?

    Not really. At least, not yet. There are still many stylistic features of the language that cannot be understood on the basis of the statistical language models. Sarcasm, jokes, white lies, certain rhetorical figures.

    ReplyDelete
    Replies
    1. 2 - It reminds me a bit of Chinese hardware/software hacker culture - it's not Open Source in spirit, but something similar, with unofficial exchange of source code, designs, ideas between people who personally know each other.

      Delete
  10. 1. Do you think it's valid concern, that people are not ready for deluge of AI-generated fake news and comments in propaganda wars? Or whole withhold is just a public stunt to gain more news coverage?

    In my opinion, each statement carries some information that can be used for various purposes. Artificial intelligence needs a lot of time and examples to learn the language, way of expression, or words characteristic of a given person. In addition, there is a wealth of information around which hard to choose relevant content. Quite frequently, the quantity is set, not the quality of the message.

    2. Is it inevitable that sometime in the future AI researchers - just like security researchers now - will be following responsible disclosure practices?

    Information is priceless in today's world. Certainly, the development of information technology will accelerate in many areas. Research into artificial intelligence and its use in many areas of life are intense. Certainly, even in the scientific community, as in all others, there will be people wanting to earn from modern technologies. Science is not just about solving problems.

    3. Do you think that further development of statistical language models might result in "true" understanding of the meaning?

    Certainly, the more we train the system on "realistic" data, the better results we get. Further development of language models will certainly help to improve the understanding of meaning. I think, that we will not be able to achieve 100% because we would have to read the thoughts of the interlocutor faultlessly, and if we have entered the present one, it is unachievable.

    ReplyDelete
    Replies
    1. I think that we are talking about personal characteristics, slangs etc. too early - It might be closer from 5-year old to most sophisticated language fencers than from current AI models to 5-year old kids.

      I think that before we reach personal characteristics, ways of expression, there is grand slope of reach

      Delete
  11. This comment has been removed by the author.

    ReplyDelete
  12. 1. I think that you don’t need an AI technology to flood people with false information. I don’t want to give examples, but all you have to do is look around to find that people are able to believe everything.

    2. I think that errors in the software will always appear and there will always be people who will catch these errors. Some will want to make a profit and others who are guided by ethics will be following identified disclosure practices.

    3. If a man is able to process a natural language with understanding then machines will sooner or later also be able to do it.
    Personally, I am impressed how over the last 10 years translation of texts from one language to another has improved, which is also related to the subject of natural language processing.

    ReplyDelete
    Replies
    1. Is understanding of language really necessary for translation? Couldn't we map sentences from one to other language using only purely statistical model?

      Delete
  13. 1. Do you think it's valid concern, that people are not ready for deluge of AI-generated fake news and comments in propaganda wars? Or whole withhold is just a public stunt to gain more news coverage?
    While OpenAI is focused on ethical implementations and won't knowingly enable fake news, it's just one organization. There's a larger concern that an unscrupulous company or a hostile government might develop a powerful AI that disseminates falsehoods on a large scale. Social networks have enjoyed some success in fighting fake news, but they might struggle if there's a flood of machine-generated misinformation.
    2. Is it inevitable that sometime in the future AI researchers - just like security researchers now - will be following responsible disclosure practices?
    Spam and fake news are two other obvious potential downsides, as is the AI’s unfiltered nature . As it is trained on the internet, it is not hard to encourage it to generate bigoted text, conspiracy theories and so on.
    3. Do you think that further development of statistical language models might result in "true" understanding of the meaning?
    Language modeling is central to many important natural language processing tasks. Recently, neural-network-based language models have demonstrated better performance than classical methods both standalone and as part of more challenging natural language processing tasks. Further, languages change, word usages change: it is a moving target.

    ReplyDelete
    Replies
    1. 3 - Yes, it also happens for example in image recognition and board games. Some part of me still hopes that neural networks will eventually hit the wall and we'll have to get back to classics :->

      Delete
  14. 1. Do you think it's valid concern, that people are not ready for deluge of AI-generated fake news and comments in propaganda wars? Or whole withhold is just a public stunt to gain more news coverage?
    No one knows it. It should be verified. In my opinion, there is no shortage of trolling people in social media. So what's the difference whether a man or a machine does it? I think that the restriction of the code has become a good advertisement.

    2. Is it inevitable that sometime in the future AI researchers - just like security researchers now - will be following responsible disclosure practices?
    Responsible disclosure practices apply to every IT guy who has access to important data. Each profession has its own practices that create the ethical area for the employee’s deeds. Ultimately, these are just practices. If they are not covered by rules or law, then respecting them is subjective.

    3. Do you think that further development of statistical language models might result in "true" understanding of the meaning?
    This will definitely help but not completely. There will always be cases in which it is impossible to understand the proper meaning without additional knowledge or context. Besides, I'm afraid that the statistical language models are slowly becoming a part of history, giving way to new methods.

    ReplyDelete
    Replies
    1. Hello! As per 1., the difference might be practically infinite scale in case of machine-generated messages. For messages generated by people there is natural limit (keystrokes? sleep?)

      Delete
  15. 1. Do you think it's valid concern, that people are not ready for deluge of AI-generated fake news and comments in propaganda wars? Or whole withhold is just a public stunt to gain more news coverage?
    Well prepared fake news is difficult to distinguish from real news. specially if it is copied by many resources or contains some partially true informations. It is dangerous that AI can generate fake news because then they may be copied or processed so many times that they will become the believed "truth".
    2. Is it inevitable that sometime in the future AI researchers - just like security researchers now - will be following responsible disclosure practices?
    I think it is possible because many dont want to reveal how something is done especially when they earn a lot of money on that. Anyway I think that resaerch in this field shouldn't behidden.
    3. Do you think that further development of statistical language models might result in "true" understanding of the meaning?

    Maybe not only statisticla language models but with cooperation with other fields and resources like knowledge bases, information exraction etc it will happen.

    ReplyDelete
  16. 1. Do you think it's valid concern, that people are not ready for deluge of AI-generated fake news and comments in propaganda wars? Or whole withhold is just a public stunt to gain more news coverage?
    I think that this concern deserves attention as social networks are becoming the tool of opinion creation. They have already been used multiple times, the last election in the US can be an example, when PR specialists directed the agitation materials to the specific people basing on the Facebook groups they were in and activities the people performed on Facebook. If an efficient and fast model able to generate human-like texts is invented, its influence on people's opinion can be very high, as it will be very hard to distinguish between fake news and true facts.
    2. Is it inevitable that sometime in the future AI researchers - just like security researchers now - will be following responsible disclosure practices?
    I think it is the fate of any potentially dangerous technology: when it becomes the tool that can be used for some threating purpose, the corresponding authorities become regularizing all the scientific and technological activities related to this technology, and, to my opinion, AI is not exception in this aspect.
    3. Do you think that further development of statistical language models might result in "true" understanding of the meaning?
    I do not have much experience with NLP(natural language processing), what I can presume is that statistical language models are built on the basis of huge parallel corpuses of several languages, e.g. polish and english. In these corpuses a one- or many-word phrase in one language has the corresponding phrase in the other language, so the model learns the most probable translation for specific phrases. To my mind, the ideas of sequence-to-sequence translation, shown in the article of I. Sutskever and al. is more promising in this scope. The authors demonstrate the model based on the LSTM architecture able to transform a sentence in one language into the vector in 1000-dimensional space. The said space plays the role of "intermediate common language", and the sentences having close meaning are transformed into vectors, which are close to each other in this space, so, to my opinion, this model is closer to the "true understanding" of the meaning.

    ReplyDelete
  17. Hi,
    In my opinion such strategy of mystery is only a public stunt to gain more interest and to make the issue more famous. Or maybe they are not ready with the total, complete proposal? Maybe they want to examine the public and to adjust their product to gained opinions, and then introduce something more corresponding to common expectations.
    As for second question, it is possible in our world, where you can copy and follow everything, using it for unpredictable purposes and in untypical ways. AI researches touch more and more sensitive issues, having influence on processes and systems used both in professional and in common world. That’s why engaging disclosure practices is probable scenario, so as it would be possible to ensure responsible results of their work.
    Referring to “true” understanding, I have mixed feelings. Speech, language is not a stable set of words, rules, meanings. For me this is rather fluent process, changing in time, depending on many unpredictable factors. So, work under meaning interpretations shall be adjusted to those variables, shall follow them on-line. Development of statistical language will go towards such “true” understanding, but it will never achieve it, in my opinion.
    br, mp

    ReplyDelete